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Field of the Invention 
The present invention provides recombinant methods and materials for 
producing polyketides by recombinant DNA technology. The invention relates to 
1 5 the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, 
medicine, molecular biology, pharmacology, and veterinary technology. 



Background of the Invention 
Polyketides represent a large family of diverse compounds synthesized 

20 from 2-carbon units through a series of condensations and subsequent 

modifications. Polyketides occur in many types of organisms, including fungi and 
mycelial bacteria, in particular, the actinomycetes. There are a wide variety of 
polyketide structures, and the class of polyketides encompasses numerous 
compounds with diverse activities. Erythromycin, FK-506, FK-520, megalomicin, 

25 narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of such compounds. Given the difficulty in producing polyketide 
compounds by traditional chemical methodology, and the typically low production 
of polyketides in wild-type cells, there has been considerable interest in finding 
improved or alternate means to produce polyketide compounds. See PCT 

30 publication Nos. WO 93/1 3663; WO 95/08548; WO 96/40968; WO 97/02358; 
and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 
5,149,639; 5,672,491; and 5,712,146; Fu et al., 1994, Biochemistry 33: 9321- 
9326; McDaniel el ai, 1993, Science 262: 1546-1550; and Rohr, 1995, Angew. 




WO 01/27284 PCT/US00/27433 

Chem. Int. Ed Engl. 34(H): 881-888, each of which is incorporated herein by 
reference. 

Polyketides are synthesized in nature by polyketide synthase (PKS) 
enzymes. These enzymes, which are complexes of multiple large proteins, are 
5 similar to the synthases that catalyze condensation of 2-carbon units in the 

biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually 
consist of three or more open reading frames (ORFs). Two major types of PKS 
enzymes are known; these differ in their composition and mode of synthesis. 
These two major types of PKS enzymes are commonly referred to as Type I or 

1 0 "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs are responsible for producing a large number of 12-, 14-, 
and 16-membered macrolide antibiotics including erythromycin, megalomicin, 
methymycin, narbomycin, oleandomycin, picromycin, and tylosin. Each ORF of a 
modular PKS can comprise one, two, or more "modules" of ketosynthase activity, 

1 5 each module of which consists of at least two (if a loading module) and more 

typically three (for the simplest extender module) or more enzymatic activities or 
"domains." These large multifunctional enzymes (>300,000 kDa) catalyze the 
biosynthesis of polyketide macrolactones through multistep pathways involving 
decarboxylative condensations between acyl thioesters followed by cycles of 

20 varying B-carbon processing activities (see O'Hagan, D. The polyketide 

metabolites; E. Horwood: New York, 1991, incorporated herein by reference). 

During the past half decade, the study of modular PKS function and 
specificity has been greatly facilitated by the plasmid-based Streptomyces 
coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) 

25 synthase (DEBS) genes (see Kao et al y 1 994, Science, 265: 509-5 1 2, McDaniel et 
ai, \99\Science 262: 1546-1557, and U.S. Patent Nos. 5,672,491 and 
5,712,146, each of which is incorporated herein by reference). The advantages to 
this plasmid-based genetic system for DEBS are that it overcomes the tedious and 
limited techniques for manipulating the natural DEBS host organism, 

30 Saccharopolyspora erythraea, allows more facile construction of recombinant 
PKSs, and reduces the complexity of PKS analysis by providing a "clean" host 
background. This system also expedited construction of the first combinatorial 
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modular polyketide library in Streptomyces (see PCT publication No. WO 
98/49315, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, such as monomer 
selection and degree of B-carbon processing, by genetic manipulation of PKSs has 
5 stimulated great interest in the combinatorial engineering of novel antibiotics (see 
Hutchinson, 1998, Curr. Opin. Microbiol 1: 319-329; Carreras and Santi, 1998, 
Curr. Opin. Biotech 9: 403-41 1; and U.S. Patent Nos. 5,712,146 and 5,672,491, 
each of which is incorporated herein by reference). This interest has resulted in the 
cloning, analysis, and manipulation by recombinant DNA technology of genes that 

1 0 encode PKS enzymes. The resulting technology allows one to manipulate a known 
PKS gene cluster either to produce the polyketide synthesized by that PKS at 
higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are >kf 
structurally related to, but distinct from, the polyketides produced from known 

15 PKS gene clusters. 

Megalomicin is a macrolide antibiotic produced by Micromonospora 
megalomicea, a member of the Actinomycetales family of soil bacteria that 
produces many types of biologically active compounds. Megalomicin is a 
glycoside of erythromycin A, a widely used antibacterial drug with little or no 

20 antimalarial activity. Megalomicin has antibacterial properties similar to those of 
erythromycin, and in 1998, it was discovered also to have potent antiparasitic 
activity and low toxicity. The antiparasitic activity may be related to the effect 
megalomicin has on protein trafficking in eukaryotes, where it appears to inhibit 
vesicular transport between the medial and trans-Golgi, resulting in under- 

25 sialylation of proteins. Hence, megalomicin offers an exciting opportunity to 
develop a new class of antiparasitic drugs with a different mechanism of action 
than the drugs currently in use and, therefore, possibly active against drug-resistant 
forms of Plasmodium falciparum. 

The number and diversity of megalomicin derivatives have been limited 

30 due to the inability to manipulate the PKS genes, which have not previously been 
available in recombinant form. Genetic systems that allow rapid engineering of the 
megalomicin biosynthetic genes would be valuable for creating novel compounds 
for pharmaceutical, agricultural, and veterinary applications. The production of 

3 
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such compounds could be more readily accomplished if the heterologous 
expression of the megalomicin biosynthetic genes in Streptomyces coelicolor and 
S. lividans and other host cells were possible. The present invention meets these 
and other needs. 

5 

Summary of the Invention 
The present invention provides recombinant methods and materials for 
expressing PKS enzymes and polyketide modification enzymes derived in whole 
and in part from the megalomicin biosynthetic genes in recombinant host cells. 

10 The invention also provides the polyketides produced by such PKS enzymes. The 
invention provides in recombinant form all of the genes for the proteins that 
constitute the complete PKS that ultimately results, in Micromonospora 
megalomicea y in the production of megalomicin. Thus, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 

15 nucleotide sequences encoding at least one domain, module, or protein encoded by 
a megalomicin PKS gene. In one preferred embodiment of the invention, the DNA 
compounds of the invention comprise a coding sequence for at least one and 
preferably two or more of the domains of the loading module and extender 
modules 1 through 6, inclusive, of the megalomicin PKS. 

20 In one embodiment, the invention provides a recombinant expression 

vector that comprises a heterologous promoter positioned to drive expression of 
one or more of the megalomicin biosynthetic genes. In a preferred embodiment, 
the promoter is derived from another PKS gene. In a related embodiment, the 
invention provides recombinant host cells comprising one or more expression 

25 vectors that produce(s) megalomicin or a megalomicin derivative or precursor. In 
a preferred embodiment, the host cell is Streptomyces lividans or S. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
comprising all or part of the megalomicin PKS and at least a part of a second PKS. 

30 In a related embodiment, the invention provides recombinant host cells 
comprising the vector that produces the hybrid PKS and its corresponding 
polyketide. In a preferred embodiment, the host cell is Streptomyces lividans or S. 
coelicolor. 

4 
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In a related embodiment, the invention provides recombinant materials for 
the production of libraries of polyketides wherein the polyketide members of the 
library are synthesized by hybrid PKS enzymes of the invention. The resulting 
polyketides can be further modified to convert them to other useful compounds, 
5 such as antibiotics, motilides, and antiparasitics, typically through hydroxylation 
and/or glycosylation. Modified macrolides provided by the invention that are 
useful intermediates in the preparation of antiparasitics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare 
a nucleic acid that encodes a modified PKS, which method comprises using the 

10 megalomicin PKS encoding sequence as a scaffold and modifying the portions of 
the nucleotide sequence that encode enzymatic activities, either by mutagenesis, 
inactivation, deletion, insertion, or replacement. The thus modified megalomicin 
PKS encoding nucleotide sequence can then be expressed in a suitable host cell £{ 
and the cell employed to produce a polyketide different from that produced by the 

15 megalomicin PKS. In addition, portions of the megalomicin PKS coding sequence 
can be inserted into other PKS coding sequences to modify the products thereof. 

In another related embodiment, the invention is directed to a multiplicity of 
cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in 

20 whole or in part from the megalomicin PKS. Thus, at least a portion of the 

modular PKS is identical to that found in the PKS that produces megalomicin and 
is identifiable as such. The derived portion can be prepared synthetically or 
directly from DNA derived from organisms that produce megalomicin. In 
addition, the invention provides methods to screen the resulting polyketide and 

25 antibiotic libraries. 

The invention also provides novel polyketides, motilides, antibiotics, 
antiparasitics and other useful compounds derived therefrom. The compounds of 
the invention can also be used in the manufacture of another compound. In a 
preferred embodiment, the compounds of the invention are formulated in a 

30 mixture or solution for administration to an animal or human. 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of megalomicin 
polyketide synthase (PKS) or a megalomicin modification enzyme. The isolated 

5 




WO 01/27284 PCT/US00/27433 

nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated nucleic 
acid fragment is a recombinant DNA compound. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or a megalomicin 
5 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAII and megAIII genes. The isolated nucleic acid fragment can 
also encode a single, multiple, or all of the domains of the megalomicin PKS. 
Exemplary domains of the megalomicin PKS include a TE domain, a KS domain, 
an AT domain, an ACP domain, a KR domain, a DH domain and an ER domain. 

10 In a preferred embodiment, the nucleic acid fragment encodes a module of the 
megalomicin PKS. In another preferred embodiment, the nucleic acid fragment 
encodes the loading module, a thioesterase domain, and all six extender modules 
of the megalomicin PKS. : 

Megalomicin modification enzymes include those enzymes involved in the 

15 conversion of 6-dEB into a megalomicin such as the enzymes encoded by the 
megF, meg BV y megCUU megK, megDI and megG (renamed megY) genes. 
Megalomicin modification enzymes also include those enzymes involved in the 
biosynthesis of mycarose, megosamine or desosamine, which are used as 
biosynthetic intermediates in the biosynthesis of various megalomicin species and 

20 other related polyketides. The enzymes that are involved in biosynthesis of 
mycarose, megosamine or desosamine are described in Figures 5 and 10. 

In a preferred embodiment, the invention provides an isolated nucleic acid 
fragment which hybridizes to a nucleic acid having a nucleotide sequence set forth 
in the SEQ. ID NO: 1 , under low, medium or high stringency. More preferably, the 

25 nucleic acid fragment comprises, consists or consists essentially of a nucleic acid 
having a nucleotide sequence set forth in the SEQ. ID NO: 1 . 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

30 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 

6 
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also provided. Preferably, such fragments, analogs or derivatives can be 
recognized by an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
5 least 90% identity, to their wild type counterparts. 

In still another specific embodiment, the invention provides an antibody, or 
a fragment or derivative thereof, which immuno-specifically binds to a domain of 
megalomicin polyketide synthase (PKS) or a megalomicin modification enzyme. 
The antibody can be a monoclonal or polyclonal antibody or an antibody fragment. 

1 0 Preferably, the antibody is a monoclonal antibody. 

In yet another specific embodiment, the invention provides a recombinant 
DNA expression vector comprising the recombinant DNA compound encoding at 
■ ; least a domain of the megalomicin PKS or a megalomicin modification enzyme, 
wherein said domain is operably linked to a promoter. Preferably, the 

1 5 recombinant DNA expression vector further comprises an origin of replication or a 
segment of DNA that enables chromosomal integration. 

In yet another specific embodiment, the invention provides a recombinant 
host cell comprising the above-described recombinant DNA expression vector 
encoding at least a domain of megalomicin PKS or the megalomicin modification 

20 enzyme. The recombinant host cells can be any suitable host cells including 
animal, mammalian, plant, fungal, yeast, and bacterial cells. Preferably, the 
recombinant host cells are Streptomyces cells, such as Streptomyces lividans and 
S. coelicolor cells, or ccharopolyspora cells, such as Saccharopolyspora erythraea 
cells. Also preferably, the recombinant host cells do not produce megalomicin in 

25 their untransformed, non-recombinant state. 

When the recombinant host cell contains nucleic acid encoding more than 
one megalomicin PKS or megalomicin modification enzyme, or domains thereof, 
such nucleic acid material can be located at a single genetic locus, e.g., on a single 
plasmid or at a single chromosomal locus, or at different genetic loci, e.g. , on 

30 separate plasmids and/or chromosomal loci. In one example, the invention 
provides a recombinant host cell, which comprises at least two separate 
autonomously replicating recombinant DNA expression vectors, and each of said 
vectors comprises a recombinant DNA compound encoding a megalomicin PKS 
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domain or a megalomicin modification enzyme operably linked to a promoter. In 
another example, the invention provides a recombinant host cell, which comprises 
at least one autonomously replicating recombinant DNA expression vector and at 
least one modified chromosome, each of said vector(s) and each of said modified 
5 chromosome comprises a recombinant DNA compound encoding a megalomicin 
PKS domain or a megalomicin modification enzyme operably linked to a 
promoter. Preferably, the autonomously replicating recombinant DNA expression 
vector and/or the modified chromosome further comprises distinct selectable 
markers. 

10 In a preferred embodiment, the cell comprises three different vectors, one 

of which is integrated into the chromosome and two of which are autonomously 
replicating, and each of the vectors comprises a meg PKS gene. Optionally, one or 
more of the meg PKS genes contains one or more domain alterations," such as a 
deletion or substitution of a meg PKS domain with a domain from another PKS. 

1 5 In yet another specific embodiment, the invention provides a hybrid PKS, 

which is produced from a recombinant gene that comprises at least a portion of a 
megalomicin PKS gene and at least a portion of a second PKS gene for a 
polyketide other than megalomicin. For example, and without limitation, the 
second PKS gene can be a narbonolide PKS gene, an oleandolide PKS gene, or a 

20 rapamycin PKS gene. In one embodiment, the hybrid PKS is composed of a 

loading module and six extender modules, wherein at least one domain of any one 
of extender modules 1 through 6, inclusive, is a domain of an extender module of 
megalomicin PKS. In another preferred embodiment, the hybrid PKS comprises a 
megalomicin PKS that has a non-functional KS domain in module 1 . 

25 In yet another specific embodiment, the invention provides a method of 

producing a polyketide, which method comprises growing the recombinant host 
cell comprising a recombinant DNA expression vector encoding at least a domain 
of the megalomicin PKS or a megalomicin modification enzyme under conditions 
whereby the megalomicin PKS domain or the megalomicin modification enzyme 

30 comprised by the recombinant expression vector is produced and the polyketide is 
synthesized by the cell, and recovering the synthesized polyketide. Preferably, the 
recombinant host cell comprises a recombinant expression vector that encodes at 
least a portion of a megAI, megAH, or megAUI gene. 
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These and other embodiments of the invention are described in more detail 
in the following description, the examples, and claims set forth below. 

Brief Description of the Figures 
5 Figure 1 shows restriction site and function maps of the insert DN A in 

cosmids pKOS079-138B, pKOS079-93D, pKOS079-93A, and pKOS079-124B of 
the invention. Various restriction sites (Xhol, BglU, Nsil) are also shown. The 
location of the megalornicin biosynthetic genes is shown below the solid lines 
indicating the cosmid inserts. The genes are shown as arrows pointing in the 

10 direction of transcription. The approximate *size (in kilobase (kb) pairs) of the gene 
cluster is indicated in 5000 bp (i.e., 5K, 10K, and the like.) increments on a solid 
bar beneath the arrows indicating the genes. 

Figure 2 shows a more detailed map of the megalornicin biosynthetic gene *nc 
cluster. The various open reading frames are shown as arrows pointing in the 

1 5 direction of transcription. A line indicates the size in base pairs (in 1 000 bp 

increments) of the gene cluster. The various domains of the megalornicin PKS are 
also shown. Other genes of the megalornicin biosynthetic gene cluster not shown 
in this Figure are located in the insert DNA of cosmids pKOSOl 38B and 
pKOS0124B. 

20 Figure 3 shows the structures of the megalomicins, azithromycin and 

erythromycin A. 

Figure 4 shows the modules and domains of DEBS and the megalornicin 

PKS. 

Figure 5 shows the compounds and reactions in the erythromycin 
25 biosynthetic pathway and also for megalornicin biosynthesis. Genes that produce 
the various enzymes that catalyze each of the steps in the biosynthetic pathway are 
indicated. 

Figure 6 shows the biosynthetic pathway for the formation of desosamine, 
rhodosamine, and mycarose, as well as the genes that produce the various enzymes 
30 that catalyze each of the steps in the biosynthetic pathway. 

Figure 7 depicts nucleotide and amino acid sequence of Micromonospora 
megalomicea megalornicin biosynthetic genes (GenBank Accession No. 
AF263245, incorporated herein by reference). 

9 
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Figure 8 depicts the biosynthesis of the erythromycins and megalomicins 
and the enzymes that mediate the biosynthesis of each. 

Figure 9 depicts the cloned megalomicin biosynthetic gene cluster and 
certain cosmids of the invention that comprise portions of the cluster. 

Figure 10 depicts the biosynthesis of megosamine, mycarose, and 
desosamine. 



The present invention provides useful compounds and methods for 
producing polyketides in recombinant host cells. As used herein, the term 
recombinant refers to a compound or composition produced by human 
intervention. The invention provides recombinant DNA compounds encoding all 
or a portion of the megalomicin biosynthetic genes. The invention provides 
recombinant expression vectors useful in producing the megalomicin PKS and 
hybrid PKSs composed of a portion of the megalomicin PKS in recombinant host 
cells. The invention also provides the polyketides produced by the recombinant 
PKS and polyketide modification enzymes. 

To appreciate the many and diverse benefits and applications of the 
invention, the description of the invention below is organized as follows. In 
Section I, common definitions used throughout this application are provided. In 
Section II, structural and functional characteristics of megalomicin are described. 
In Section III, the recombinant megalomicin biosynthetic genes and other 
recombinant nucleic acids provided by the invention are described. In Section IV, 
polypeptides and proteins encoded by the megalomicin biosynthetic genes and 
antibodies that specifically bind to such polypeptides and proteins provided by the 
invention are described. In Section V, methods for heterologous expression of the 
megalomicin biosynthetic genes provided by the invention are described. In 
Section VI, the hybrid PKS genes provided by the invention are described. In 
Section VII- host cells containing multiple megalomicin biosynthetic genes and 
nucleic acid fragments on separate express vectors provided by the invention are 
described. In Section VIII, the polyketide compounds provided by the invention 
and pharmaceutical compositions of those compounds are described. The detailed 
description is followed by working examples illustrating the invention. 



Detailed Description of the Invention 
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Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as is commonly understood by one of ordinary skill in the 
art to which this invention belongs. All patents, applications, published 
applications and other publications and sequences from GenBank and other data 
5 bases referred to herein are incorporated by reference in their entirety. 



Section I. Definitions 

As used herein, domain refers to a portion of a molecule, e.g., proteins or 
nucleic acids, that is structurally and/or functionally distinct from another portion 
10 of the molecule. 

As used herein, antibody includes antibody fragments, such as Fab 
fragments, which are composed of a light chain and the variable region of a heavy 
chain. +-c 

As used herein, biological activity refers to the in vivo activities of a 
1 5 compound or physiological responses that result upon in vivo administration of a 
compound, composition or other mixture. Biological activity, thus, encompasses 
therapeutic effects and pharmaceutical activity of such compounds, compositions 
and mixtures. Biological activities may be observed in in vitro systems designed 
to test or use such activities. 

20 As use d herein, a combination refers to any association between two or 

among more items. 

As used herein, a composition refers to any mixture. It may be a solution, 
a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination 
thereof. 

25 As used herein, derivative or analog of a molecule refers to a portion 

derived from or a modified version of the molecule. 

As used herein, operably linked, operatively linked or operationally 
associated refers to the functional relationship of DNA with regulatory and 
effector sequences of nucleotides, such as promoters, enhancers, transcriptional 

30 and translation^ stop sites, and other signal sequences. For example, operative 
linkage of DNA to a promoter refers to the physical and functional relationship 
between the DNA and the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that specifically recognizes, 

II 
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binds to and transcribes the DNA. To optimize expression and/or in vitro 
transcription, it may be helpful to remove, add or alter 5' untranslated portions of 
the clones to eliminate extra, potentially inappropriate alternative translation 
initiation (i.e., start) codons or other sequences that may interfere with or reduce 
5 expression, either at the level of transcription or translation. Alternatively, 
consensus ribosome binding sites (see, e.g., Kozak, J. Biol. Chem.,266:\9Z67- 
19870 (1991)) can be inserted immediately 5' of the start codon and may enhance 
expression. The desirability of (or need for) such modification may be empirically 
determined. 

1 o As used herein, pharmaceutical^ acceptable salts, esters or other 

derivatives of the conjugates include any salts, esters or derivatives that may be 
readily prepared by those of skill in this art using known methods for such 
r derivatization and that produce compounds that may be administered to animals or 
humans without substantial toxic effects and that either are pharmaceutical^ 

1 5 active or are prodrugs. 

As used herein, a promoter region or promoter element refers to a segment 
of DNA or RNA that controls transcription of the DNA or RNA to which it is 
operatively linked. The promoter region includes specific sequences that are 
sufficient for RNA polymerase recognition, binding and transcription initiation. 
20 This portion of the promoter region is referred to as the promoter. In addition, the 
promoter region includes sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. These sequences may be ex- 
acting or may be responsive to trans acting factors. Promoters, depending upon 
the nature of the regulation, may be constitutive or regulated. 
25 As used herein: stringency of hybridization in determining percentage 

mismatch is as follows: (1) high stringency: 0.1 x SSPE, 0.1% SDS, 65°C; (2) 
medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C; and (3) low stringency: 1.0 x 
SSPE, 0.1% SDS, 50°C. Equivalent stringencies may be achieved using alternative 
buffers, salts and temperatures. 
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The term substantially identical or homologous or similar varies with the 
context as understood by those skilled in the relevant art and generally means at 
least 70%, preferably means at least 80%, more preferably at least 90%, and most 
preferably at least 95% identity. 
5 As used herein, substantially identical to a product means sufficiently 

similar so that the property of interest is sufficiently unchanged so that the 
substantially identical product can be used in place of the product. 

As used herein, isolated means that a substance is either present in a 
preparation at a concentration higher than that substance is found in nature or in its 

1 0 naturally occurring state or that the substance is present in a preparation that 

contains other materials with which the substance is not associated with in nature. 
As an example of the latter, an isolated meg PKS protein includes a meg PKS 
protein expressed in a Stfeptomyces coelicolor or S. lividans host cell. 

As used herein, substantially pure means sufficiently homogeneous to 

1 5 appear free of readily detectable impurities as determined by standard methods of 
analysis, such as thin layer chromatography (TLC), gel electrophoresis and high 
performance liquid chromatography (HPLC), used by those of skill in the art to 
assess such purity, or sufficiently pure such that further purification would not 
detectably alter the physical and chemical properties, such as enzymatic and 

20 biological activities, of the substance. Methods for purification of the compounds 
to produce substantially chemically pure compounds are known to those of skill in 
the art. A substantially chemically pure compound may, however, be a mixture of 
stereoisomers or isomers. In such instances, further purification might increase 
the specific activity of the compound. 
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As used herein, vector or plasmid refers to discrete elements that are used 
to introduce heterologous DNA into cells for either expression or replication 
thereof. Selection and use of such vehicles are well known within the skill of the 
artisan. An expression vector includes vectors capable of expressing DNAs that 

5 are operatively linked with regulatory sequences, such as promoter regions, that 
are capable of effecting expression of such DNA fragments. Thus, an expression 
vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, 
recombinant virus or other vector that, upon introduction into an appropriate host 
cell, results in expression of the cloned DNA. Appropriate expression vectors are 

1 0 well known to those of skill in the art and include those that are replicable in 

eukaryotic cells and/or prokaryotic cells and those that remain episomal or those 
which integrate into the host cell genome. 

Section II. Megalomicins 

\ 5 The megalomicins were discovered in 1 969 at Schering Corp. as 

antibacterial agents produced by Micromonospora megalomicea (see Weinstein et 
aL, 1969, J. Antibiotics 22: 253-258, and U.S. Patent No. 3,632,750, both of 
which are incorporated herein by reference). Although the initial structural 
assignment was in error, a thorough reassessment of NMR data coupled with an 

20 X-ray crystal structure of a megalomicin A derivative (see Nakagawa and Omura, 
"Structure and Stereochemistry of Macrolides" in Macrolide Antibiotics (S. 
Omura, ed.), Academic Press, NY, 1984, incorporated herein by reference) 
established the structures shown in Figure 3. The megalomicins are 6-0- 
glycosides of erythromycin C with acetyl or propionyl groups esterified at the 3'" 

25 or 4'" hydroxyls of the mycarose sugar at the C-3-position. The C-6 sugar has 

been named u megosamine," although it had been identified 5 to 10 years earlier as 
L-rhodosamine or JV-dimethyldaunosamine, deoxyamino sugars commonly present 
in the anthracycline antitumor drugs. The antibacterial potency, spectrum of 
activity, and toxicity (LD 5 o acute, 7-7.5 g/kg s.c. or oral; subacute, >500 mg/kg) of 

30 the megalomicins is similar to that of erythromycin A. 

The megalomicins have two modes of biological activity. As antibacterials, 
they act like the erythromycins, which inhibit protein synthesis at the translocation 
step by selective binding to the bacterial 505 ribosomal RNA. They also affect 
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protein trafficking in eukaryotic cells (see Bonay et al., 1996, J. Biol Chem. 
277:3719-3726, incorporated herein by reference). Although the mechanism of 
action is not entirely clear, it appears to involve inhibition of vesicular transport 
between the medial and trans Golgi, resulting in under-sialylation of proteins. The 
5 megalomicins also strongly inhibit the ATP-dependent acidification of lysosomes 
in vivo (see Bonay et al., 1997, J. Cell Sci. 770:1839-1849, incorporated herein by 
reference) and cause an anomalous glycosylation of viral proteins, which may be 
responsible for their antiviral activity against herpes (T0X50, 70-100 ^M; see 
Alarcon et al., 1984, Antivir. Res. 4:23 1-243, and Alarcon et al, 1988, FEES Lett. 

1 0 231 :207-2 1 1 , both of which are incorporated herein by reference). 

Strikingly, the megalomicins are potent antiparasitic agents, showing an 
IC50 of 1 ng/ml in blocking intracellular replication of Plasmodium falciparum 
infected erythrocytes (see Bonay et al, 1998, Antimicrob. Agents Chemother. 
42:2668-2673, incorporated herein by reference). The megalomicins are effective 

1 5 against Trypanosoma cruzi and T. brucei (IC50, 0.2-2 \xg/ml) plus Leishmania 
donovani and L. major promastigotes (IC50, 3 and 8 ng/ml, respectively). 
Megalomicin is also active against the intracellular replicative, amastigote form of 
T. cruzi, completely preventing its replication in infected murine LLC/MK2 
macrophages at a dose of 5 ng/ml. Importantly, the effective drug concentration is 

20 500-fold less than the acute LD50 in mammals, and there is no toxicity to BALB/c 
mice at doses (50 mg/kg) that are completely curative for T. brucei infections. 
Because the erythromycins do not have such activity, although azithromycin 
(Figure 3) has been reported to be an effective acute and prophylactic treatment for 
malaria caused by P. vivax and P. falciparum (see Taylor et al., 1 999, Clin. Infect. 

25 Dis. 28:74-81, incorporated herein by reference), the antiparasitic action of the 
megalomicins is unique and probably related to the presence of the deoxyamino 
sugar megosamine at C-6 (Figure 3). Consequently, the megalomicins could be 
developed into potent antimalarial drugs with a high therapeutic index and be 
active against P. falciparum and other species that are resistant to currently used 

30 classes of antimalarials. They also could lead to potent antiparasitic agents against 
leishmaniasis, trypanosomiasis, and Chagas' disease. In view of the widespread 
use of the erythromycins and their good oral availability plus the low mammalian 
toxicity of macrolides in general, the megalomicins could be used prophylactically 
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to combat malaria, and as fermentation products, the megalomicins should be 
relatively inexpensive to produce. . 

The megalomicins belong to the polyketide class of natural products whose 
members have diverse structural and pharmacological properties (see Monaghan 
5 and Tkacz, 1990, Annu. Rev. Microbiol. 44: TJX, incorporated herein by 
reference). The megalomicins are assembled by polyketide synthases through 
successive condensations of activated coenzyme-A thioester monomers derived 
from small organic acids such as acetate, propionate, and butyrate. Active sites 
required for condensation include an acyltransferase (AT), acyl carrier protein 

1 0 (ACP), and beta-ketoacylsynthase (KS). Each condensation cycle results in a B- 
keto group that undergoes all, some, or none of a series of processing activities. 
Active sites that perform these reactions include a ketoreductase (KR), 
dehydratase (DH), and enoylreductase (ER). Thus, the absence of any beta-keto 
processing domain results in the presence of a ketone, a KR alone gives rise to a 

1 5 hydroxyl, a KR and DH result in an alkene, while a KR, DH, and ER combination 
leads to complete reduction to an alkane. After assembly of the polyketide chain, 
the molecule typically undergoes cyclization(s) and post-PKS modification (e.g. 
glycosylation, oxidation, acylation) to achieve the final active compound. 

Macrolides such as erythromycin and megalomicin are synthesized by 

20 modular PKSs (see Cane et aL, 1998, Science 282: 63, incorporated herein by 
reference). For illustrative purposes, the PKS that produces the erythromycin 
polyketide (6-deoxyerythronolide B synthase or DEBS; see U.S. Patent No. 
5,824,513, incorporated herein by reference) is shown in Figure 4. DEBS is the 
most characterized and extensively used modular PKS system. DEBS is 

25 particularly relevant to the present invention in that it synthesizes the same 
polyketide, 6-deoxyerythronolide B (6-dEB), synthesized by the megalomicin 
PKS. In modular PKS enzymes such as DEBS and the megalomicin PKS, the 
enzymatic steps for each round of condensation and reduction are encoded within 
a single "module" of the polypeptide (i.e., one distinct module for every 

30 condensation cycle). DEBS consists of a loading module and 6 extender modules 
and a chain terminating thioesterase (TE) domain within three extremely large 
polypeptides encoded by three open reading frames (ORFs, designated eryAI, 
eryAII, and eryAIII). 
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Each of the three polypeptide subunits of DEBS (DEBSI, DEBSII, and 
DEBSIII) contains 2 extender modules, DEBSI additionally contains the loading 
module. Collectively, these proteins catalyze the condensation and appropriate 
reduction of 1 propionyl CoA starter unit and 6 methylmalonyl CoA extender 
5 units. Modules 1, 2, 5, and 6 contain KR domains; module 4 contains a complete 
set, KR/DH/ER, of reductive and dehydratase domains; and module 3 contains no 
functional reductive domain. Following the condensation and appropriate 
dehydration and reduction reactions, the enzyme bound intermediate is lactonized 
by the TE at the end of extender module 6 to form 6-dEB. 

10 More particularly, the loading module of DEBS consists of two domains, 

an acyl-transferase (AT) domain and an acyl carrier protein (ACP) domain. In 
other PKS enzymes, the loading module is not composed of an AT and an ACP 
but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is 
in most instances called KS°, where the superscript letter is the abbreviation for 

1 5 the amino acid, glutamine, that is present instead of the active site cysteine 
required for activity. The AT domain of the loading module recognizes a 
particular acyl-CoA (propionyl for DEBS, which can also accept acetyl) and 
transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT 
on each of the extender modules recognizes a particular extender-Co A 

20 (methylmalonyl for DEBS) and transfers it to the ACP of that module to form a 
thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of 
the loading module migrates to form a thiol ester (trans-esterification) at the KS of 
the first extender module; at this stage, extender module I possesses an acyl-KS 
and a methylmalonyl ACP. The acyl group derived from the loading module is 

25 then covalently attached to the alpha-carbon of the malonyl group to form a 

carbon-carbon bond, driven by concomitant decarboxylation, and generating a new 
acyl-ACP that has a backbone two carbons longer than the loading unit 
(elongation or extension). The growing polyketide chain is transferred from the 
ACP to the KS of the next module, and the process continues. 

30 The polyketide chain, growing by two carbons each module, is sequentially 

passed as a covalently bound thiol ester from module to module, in an assembly 
line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the 
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name polyketide arises. Commonly, however, the beta keto group of each two- 
carbon unit is modified just after it has been added to the growing polyketide 
chain but before it is transferred to the next module by either a KR, a KR plus a 
DH, or a KR, a DH, and an ER. As noted above, modules may contain additional 
5 enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The 
resulting polyketide can be modified further by tailoring or modification enzymes; 

1 0 these enzymes add carbohydrate groups or methyl groups, or make other 

modifications, i.e., oxidation or reduction, on the polyketide core molecule. For 
example, the final steps in conversion of 6-dEB to erythromycin A include the 
actions of a number of modification enzymes, such as: C-6 hydroxy lation, 
attachment of mycarose and desosamine sugars, C-12 hydroxylation (which 

1 5 produces erythromycin C), and conversion of mycarose to cladinose via O- 
methylation, as shown in Figure 5. 

With this overview of PKS and post-PKS modification enzymes, one can 
better appreciate the recombinant megalomicin biosynthetic genes provided by the 
invention and their function, as described in the following Section. 

20 

Section III: The Megalomicin Biosynthetic Genes and Nucleic Acid Fragments 

The megalomicin PKS was isolated and cloned by the following 
procedure. Genomic DNA was isolated from a megalomicin producing strain of 
Micromonospora megalomicea subsp. nigra (ATCC 27598), partially digested 

25 with a restriction enzyme, and cloned into a commercially available cosmid vector 
to produce a genomic library. This library was then probed with probe generated 
from the erythromycin biosynthetic genes as well as from cosmids identified as 
containing sequences homologous to erythromycin biosynthetic genes. This 
probing identified a set of cosmids, which were analyzed by DNA sequence 

30 analysis and restriction enzyme digestion, which revealed that the desired DNA 
had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on four of the cosmids identified. Figure 1 shows the 
cosmids, and the portions of the megalomicin biosynthetic gene cluster in the 

18 
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insert DNA of the cosmids. Figure t shows that the complete megalomicin 
biosynthetic gene cluster is contained within the insert DNA of cosmids 
pKOS079-138B, pKOS079-124B, pKOS079-93D, and pKOS079-93A. Each of 
these cosmids has been deposited with the American Type Culture Collection in 
5 accordance with the terms of the Budapest Treaty (cosmid pKOS079-l 38B is 

available under accession no. ATCC ; cosmid pKOS079-124B is available 

under accession no. ATCC ; cosmid pKOS079-93D is available under 

accession no. ATCC ; and cosmid pKOS079-93A is available under 

accession no. ATCC ). Various additional reagents of the invention can be 

1 0 isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. Further analysis of these 
cosmids and subclones prepared from the cosmids facilitated the identification of 
the location of various megalomicin biosynthetic genes, including the ORFs 
encoding the PKS, modules encoded by those ORFs, and coding sequences for 

15 megalomicin modification enzymes. The location of these genes and modules is 
shown on Figure 2. 

Those of skill in the art will recognize that, due to the degenerate nature of 
the genetic code, a variety of DNA compounds differing in their nucleotide 
sequences can be used to encode a given amino acid sequence of the invention. 

20 The native DNA sequence encoding the megalomicin PKS and other biosynthetic 
enzymes and other biosynthetic enzymes of Micromonospora megalomicea is 
shown herein merely to illustrate a preferred embodiment of the invention, and the 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 

25 polypeptide can typically tolerate one or more amino acid substitutions, deletions, 
and insertions in its amino acid sequence without loss or significant loss of a 
desired activity. The present invention includes such polypeptides with alternate 
amino acid sequences, and the amino acid sequences encoded by the DNA 
sequences shown herein merely illustrate preferred embodiments of the invention. 

30 The recombinant nucleic acids, proteins, and peptides of the invention are 

many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the 
various regions of the megalomicin PKS and the megalomicin modification 

19 
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enzymes and corresponding coding sequences is provided. To facilitate description 
of the invention, reference to a PKS, protein, module, or domain herein can also 
refer to DNA compounds comprising coding sequences therefor and vice versa. 
Also, unless otherwise indicated, reference to a heterologous PKS refers to a PKS 
5 or DNA compounds comprising coding sequences therefor from an organism 
other than Micromonospora megaiomicea. In addition, reference to a PKS or its 
coding sequence includes reference to any portion thereof 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
existing in a preparation in an abundance and/or concentration not found in nature) 

1 0 and purified (i.e., substantially free of contaminating materials or substantially free 
of materials with which the corresponding DNA would be found in nature) form. 
The DNA molecules of the invention comprise one or more sequences that encode 
one or more domains (or fragments of such domains) of one or more modules in 
one or more of the ORFs of the megalomicin PKS and sequences that encode 

1 5 megalomicin modification enzymes from the megalomicin biosynthetic gene 
cluster. Examples of PKS domains include the KS, AT, DH, KR, ER, ACP, and 
TE domains of at least one of the 6 extender modules and loading module of the 
three proteins encoded by the three ORFs of the megalomicin PKS gene cluster. 
Examples of megalomicin modification enzymes include those that synthesize the 

20 mycarose, desosamine, and megosamine moieties, those that transfer those sugar 
moieties to the polyketide 6-dEB, those that hydroxylate the polyketide at C-6 and 
CM 2, and those that acylate the sugar moieties. 

In an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid, as described in more detail in the 

25 following Section. Generally, such vectors can either replicate in the cytoplasm of 
the host cell or integrate into the chromosomal DNA of the host cell. In either 
case, the vector can be a stable vector (i.e., the vector remains present over many 
cell divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 

30 The megalomicin PKS gene cluster comprises three ORFs (megAI, megAII, 

and megAIII). Each ORF encodes two extender modules of the PKS; the first ORF 
also encodes the loading module. Each extender module is composed of at least a 
KS, an AT, and an ACP domain. The locations of the various encoding regions of 

20 
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these ORFs are shown in Figure 2 and described with reference to the sequence 
information below. The megalomicin PKS produces the polyketide known as 6- 
dEB, shown in Figure 4. In megalomicin-producing organisms, 6-dEB is 
converted to erythromycin C by a set of modification enzymes. Thus, 6-dEB is 
5 converted to erythronolide B by the megF gene product (a homolog of the eryF 
gene product), then to 3-alpha-mycarosyl-erythronolide B by the megBVgene 
product (a homolog of the eryBVgem product), then to erythromycin D by the 
megCIII gene product (a homolog of the eryCHI gene product, then to 
erythromycin C by the megK gene product (a homolog of the eryK gene product). 

1 0 In addition to these modification enzymes, such megalomicin-producing 

organisms also contain the modification enzymes necessary for the biosynthesis of 
the desosamine and mycarose moieties that are similarly utilized in erythromycin 
^biosynthesis, as shown in Figure 5. Megalomicin A contains the complete 
. erythromycin C structure, and its biosynthesis additionally involves the formation 

15 of L-megosamine (L-rhodosamine) and its attachment to the C-6 hydroxyl 
(Figures 3 and 5, inset), followed by acylation of the C-3'" and(or) C-4"' 
hydroxyls as the terminal steps. L-megosamine is the same as N-dimethyl-L- 
daunosamine; the daunosamine genes have been characterized from Streptomyces 
peucetius (see Colombo and Hutchinson, J. Indus t. Microbiol BiotechnoL, in 

20 press; Otten et al., 1 996, J Bacteriol 1 78:73 1 6-732 1 , and references cited therein). 
Some of the rhodosamine genes also have been cloned and partially characterized 
from another anthracycline producing Streptomyces sp. (see Torkkell et al., 1997, 
Mol Gen. Genet. 25tf(2):203-209). Because the timing of the glycosylation with 
TDP-megosamine in relation to the addition of mycarose and desosamine to 

25 erythronolide B, plus the C-12 hydroxylation, is unknown, the pathway could 
involve a different order of glycosylation and C-12 hydroxylation steps than the 
one shown in Figure 5. Regardless, the megalomicin biosynthetic gene cluster 
contains the genes to make L-rhodosamine and attach it to the correct macrolide 
substrate. 

30 The biosynthetic pathways to make the glycosides desosamine, mycarose, 

and megosamine are shown in Figure 6. The present invention provides the genes 
for each biosynthetic pathway shown in this Figure, and these recombinant genetic 
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pathways can be used alone or in any combination to confer the pathway to a 
heterologous host. 

The megalomicin PKS locus is similar to the eryA locus in size and 
organization. Most of the deoxysugar biosynthesis genes are homologs of the eryB 
5 mycarose and eryC desosamine biosynthesis and glycosyl attachment genes from 
Saccharopolyspora erythraea (see Summers et ai 7 1997, Microbiol. 743:3251- 
3262; Haydockeftf/., 1991, Mol Gen. Genet. 230:120-128; Gaissere/ al. y 1997, 
Mol Gen Genet, 256:239-251; Gaisser et aL, 1998, Mol Gen Genet. 257:78-88, 
incorporated herein by reference) or the picC homologs from the picromycin and 

1 0 narbomycin producer (see PCT patent publication No. 99/6 1 599 and Xue et al., 
1998, Proc. Nat. Acad. Sci. USA 95, 121 1 1-121 16, incorporated herein by 
reference). The TDP-megosamine biosynthesis genes are homologs of the dnm 
genes (see Figure 5) and the pikromycin N-dimethyltransferase gene or its 
homologs reported in a cluster of L-rhodosamine biosynthesis genes. The putative 

1 5 TDP-megosamine glycosyltransferase gene product (geneX in Figure 5) closely 
resembles the deduced products of the eryBV , eryCIII, dnmS, and pikromycin 
desVJI genes, even though it recognizes different substrates than the products of 
each of these genes. 

The following Table 1 shows the location of the genes in the 

20 Micromonospora megalomicea megalomicin biosynthetic pathway in the DNA 
sequence set forth in SEQ ID NO:l (see also Figure 7; note some gene 
designations maybe different in Figure 7). 



25 
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Table 1 . Megalomicin Biosynthetic Gene Cluster 
Micromonospora megalomicea subsp. nigra (ATCC27598) 



Location 
1..2451 

complement 1.. 144) 

2,3-dehydratase 

928..2061 

2072..3382 

homolog) 

2452..40397 

3462..4634 

4651. .5775 



Description 

sequence from cosmid pKOS079-138B 
megBVI (or megT), TDP-4-keto-6-deoxyglucose- 

megDVI, TDP-4-keto-6-deoxygIucose 3,4-isomerase 
megDI, TDP-megosaminyl transferase (eryCIII 

sequence of cosmid pKOS079-93D 
megGfor megY), mycarosyl acyltransferase 
megDII, deoxysugar transaminase (eryCI, DnrJ 
homolog) 
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5822..6595 
dimethyltransferase 

6592..7197 

5 

7220..8206 
dnmV 

complement(8228..9220) 
1 0 hexose 2,3-reductase 

complement(9226.. 1 0479) 

complement^ 0483..1 1424) 

12181..22821 

12181-13791 
15 12505.. 13470 

13576.. 13791 

13849.. 18207 

13849..15126 

15427.M6476 
20 171 55.. 17694 

17947.. 18207 

18268-22575 

18268.. 19548 

19876. .209 10 
25 21517..22053 

2231 8..22575 

22867.33555 

22957..27258 

22957..24237 
30 24544..25581 

26230..26733 

26998.-27258 

27313..33312 

27393..28590 
35 28897-29931 

29953..30477 

31 396.32244 

322S7..32799 

33052..33312 
40 33666-43271 

33780..38120 

33780-35027 

35385-36419 

37068..37604 
45 37860-38120 

38187-42425 

38187..39470 

39795..40811 

40398-46641 



megDIII, TDP-daunosaminyl-N,N- 
(eryCVI homolog) 

megDIV, TDP-4-keto-6-deoxyglucose 3,5-epimerase 

(eryBVII, dnmU homolog) 

megDV, TDP-hexose 4-ketoreductase (eryBIV, 

homolog) 

megBII-l or megDV U, TDP-4-keto-L-6-deoxy- 

megBV, TDP-mycarosyl transferase 
megBIV, TDP-hexose 4-ketoreductase 
megAI 

Loading Module (L) 

AT-L 

ACP-L 

Extender Module 1 (1) 

KS1 

ATI 

K.R1 

ACPI 

Extender Module 2 (2) 

KS2 

AT2 

KR2 

ACP2 

megAII 

Extender Module 3 (3) 

K.S3 

AT3 

KR3 (inactive) 
ACP3 

Extender Module 4 (4) 

KS4 

AT4 

DH4 

ER4 

KR4 

ACP4 

megAUI 

Extender Module 5 (5) 

KS5 

AT5 

KR5 

ACP5 

Extender Module 6 (6) 

KS6 

AT6 

sequences from cosmid pKOS079-93A 
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41406..41936 KR6 
42168..42425 ACP6 
42585..43271 TE 

43268.-44344 megCII, TDP-4-keto-6-deoxyglucose 3,4-isomerase 

5 443S5..45623 megCIII, TDP-desosaminyl transferase 

45620..46591 megBII, TDP-4-keto-6-deoxy-L-gIucose 2,3 

dehydratase 

comp!ement(46660..47403) megH, TEII 
complement(474 1 1 ..47980) megF, C-6 hydroxylase 

10 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of the 
megalomicin polyketide synthase or a megalomicin modification enzyme. The 
isolated nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated 

1 5 nucleic acid fragment is a recombinant DNA compound. A nucleotide sequence 
that is complementary to the nucleotide sequence encoding a domain of 
megalomicin PKS or a megalomicin modification enzyme is also provided. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or the megalomicin 

20 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAIl and megAIII genes. The isolated nucleic acids of the invention 
also include nucleic acids that encode one or more domains and one or more 
modules of the megalomicin PKS. Exemplary domains of the megalomicin PKS 
include a TE domain, a KS domain, an AT domain, an ACP domain, a KR 

25 domain, a DH domain and an ER domain. In a preferred embodiment, the nucleic 
acid comprises the coding sequence for a loading module, a thioesterase domain, 
and all six extender modules of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
conversion of 6-DEB into a megalomicin such as the enzymes encoded by megF, 

30 meg BV, megCIII y megK, megDI and megG (or megY). Megalomicin modification 
enzymes also include those enzymes involved in the biosynthesis of mycarose, 
megosamine or desosamine, which are used as biosynthetic intermediates in the 
biosynthesis of various megalomicin species and other related polyketides. The 
enzymes that are involved in biosynthesis of mycarose, megosamine or 

35 desosamine are described in Figures 5 and 10. The megalomicin PKS and 

megalomicin modification enzymes are collectively referred to as megalomicin 
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biosynthetic enzymes; the genes encoding such enzymes are collectively referred 
to as megalomicin biosynthetic genes; and nucleic acids that comprise a portion of 
or entire megalomicin biosynthetic genes are collectively referred to as 
megalomicin biosynthetic nucleic acid(s). 
5 In specific embodiments, the megalomicin biosynthetic nucleic acids 

comprise the sequence of SEQ ID NO: 1 , or the coding regions thereof, or 
nucleotide sequences encoding, in whole or in part, a megalomicin biosynthetic 
enzyme protein. The isolated nucleic acids typically consists of at least 25 
(continuous) nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 

10 nucleotides of megalomicin biosynthetic nucleic acid sequence, or a full-length 
megalomicin biosynthetic coding sequence. In another embodiment, the nucleic 
acids are smaller than 35, 200, or 500 nucleotides in length. Nucleic acids can be 
single or double stranded. Nucleic acids that hybridize to or are complementary to 
the foregoing sequences, in particular the inverse complement to nucleic acids that 

15 hybridize to the foregoing sequences (i.e., the inverse complement of a nucleic 
acid strand has the complementary sequence running in reverse orientation to the 
strand so that the inverse complement would hybridize without mismatches to the 
nucleic acid strand) are also provided. In specific aspects, nucleic acids are 
provided which comprise a sequence complementary to (specifically are the 

20 inverse complement of) at least 10, 25, 50, 100, or 200 nucleotides or the entire 
coding region of a megalomicin biosynthetic gene. 

The megalomicin biosynthetic nucleic acids provided herein include those 
with nucleotide sequences encoding substantially the same amino acid sequences 
. as found in native megalomicin biosynthetic enzyme proteins, and those encoding 

25 amino acid sequences with functionally equivalent amino acids, as well as 

megalomicin biosynthetic enzyme derivatives or analogs as described in Section 
IV. 

Some regions within the megalomicin PKS genes are highly homologous 
or identical to one another, as can be readily identified by an analysis of the 
30 sequence. The coding sequence for the KS and AT domains of module 2 shares 
significant identity with the coding sequence for the KS and AT domains of 
module .6. This sequence homology or identity at the nucleic acid, e.g., DNA, level 
can render the nucleic acid unstable in certain host cells. To improve the stability 
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of the nucleic acids comprising a portion or the entire megalomicin PKS genes and 
megalomicin modification enzyme genes, the nucleic acid or DNA sequences can 
be changed to reduce or abolish the sequence homology or identity. Preferably, 
the DNA codons of homologous regions within the PKS or the megalomicin 
5 modification enzyme coding sequence are changed to reduce or abolish the 
sequence homology or identity without changing the amino acid sequences 
encoded by said changed DNA codons (see the examples below). The stability of 
the nucleic acid or DNA can also be improved by codon changes that reduce or 
abolish the sequence homology or identity while also changing the amino acid 

10 sequence, provided that the amino acid sequence change(s) does not substantially 
change the desired activity of the encoded megalomicin PKS. Thus, for example, 
one can simply substitute for the megAIJI ORF an ORF from eryAIIl, oleAIII, 
picAIII, or picAIV genes. 

The recombinant DNA compounds of the invention that encode the 
, 15 megalomicin PKS and modification proteins or portions thereof are useful in a 

variety of applications. While many of these applications relate to the heterologous 
expression of the megalomicin biosynthetic genes or the construction of hybrid 
PKS enzymes, many useful applications involve the natural megalomicin producer 
Micromonospora megalomicea. For example, one can use the recombinant DNA 

20 compounds of the invention to disrupt the megalomicin biosynthetic genes by 
homologous recombination in Micromonospora megalomicea. The resulting host 
cell is a preferred host cell for making polyketides modified by oxidation, 
hydroxylation, glycosylation, and acylation in a manner similar to megalomicin, 
because the genes that encode the proteins that perform these reactions are of 

25 course present in the host cell, and because the host cell does not produce 

megalomicin that could interfere with production or purification of the polyketide 
of interest. 

One illustrative recombinant host cell provided by the present invention 
expresses a recombinant megalomicin PKS in which the module 1 KS domain is 
30 inactivated by deletion or other mutation. In a preferred embodiment, the 

inactivation is mediated by a change in the KS domain that renders it incapable of 
binding substrate (called a KS 1 ° mutation). In a particularly preferred 
embodiment, this inactivation is rendered by a mutation in the codon for the active 

26 
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site cysteine that changes the codon to another codon, such as an alanine codon. 
Such constructs are especially useful when placed in translational reading frame 
with extender modules 1 and 2 of a megalomicin or the corresponding modules of 
another PKS. The utility of these constructs is that host cells expressing, or cell 
5 free extracts containing, a PKS comprising the protein encoded thereby can be fed 
or supplied with N-acylcysteamine thioesters of precursor molecules to prepare a 
polyketide of interest. See U.S. patent application Serial No. 09/492,773, filed 27 
Jan. 2000, and PCT patent publication No. 00/44717, both of which are 
incorporated herein by reference. Such KS1° constructs of the invention are useful 

10 in the production of 1 3-substituted-megalomicin compounds in Micromonospora 
megalomicea host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. 

In a variant of this embodiment, one can employ a megalomicin PKS in 

15 which the ACP domain of module 1 has been rendered inactive. In another 
embodiment, one can delete the loading domain of the megalomicin PKS and 
provide monoketide substrates for processing by the remainder of the PKS. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 

20 modules of the megalomicin PKS or for another megalomicin biosynthetic gene 
have been deleted by homologous recombination with the Micromonospora 
megalomicea chromosomal DNA. Those of skill in the art will appreciate that the 
compounds used in the recombination process are characterized by their homology 
with the chromosomal DNA and not by encoding a functional protein due to their 

25 intended function of deleting or otherwise altering portions of chromosomal DNA. 
For this and a variety of other applications, the compounds of the present 
invention include not only those DNA compounds that encode functional proteins 
but also those DNA compounds that are complementary or identical to any portion 
of the megalomicin biosynthetic genes. 

30 Thus, the invention provides a variety of modified Micromonospora 

megalomicea host cells in which one or more of the megalomicin biosynthetic 
genes have been mutated or disrupted. Transformation systems for M. 
megalomicea have been described by Hasegawa et a/., 1 991 , J. BacterioL 
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/75:7004-l 1; and Takada et al., 1994, J. Antibiot. 47:\ 167-1 170, both of which 
are incorporated herein by reference. These cells are especially useful when it is 
desired to replace the disrupted function with a gene product expressed by a 
recombinant DNA expression vector. While such expression vectors of the 
5 invention are described in more detail in the following Section, those of skill in 
the art will appreciate that the vectors have application to M megalomicea as well. 
Such Af. megalomicea host cells can be preferred host cells for expressing 
megalomicin derivatives of the invention. Particularly preferred host cells of this 
type include those in which the coding sequence for the loading module has been 
10 mutated or disrupted, those in which one or more of any of the PKS gene ORFs 
has been mutated or disrupted, and/or those in which the genes for one or more 
modification (glycosylation, acylation, hydroxylation) have been mutated or 
disrupted. 

While the present invention provides many useful compounds having 
1 5 application to, and recombinant host cells derived from, Micromonospora 

megalomicea, many important applications of the present invention relate to the 
heterologous expression of all or a portion of the megalomicin biosynthetic genes 
in cells other than M. megalomicea, as described in Section V. 



20 Section IV: The Megalomicin Biosynthetic Enzymes and Antibodies Recognizing 
such Enzymes 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

25 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
also provided. Preferably, such fragments, analogs or derivatives can be 

30 recognized an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
least 90% identity to their wild type counterparts. 
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An exemplary nucleotide sequence encoding, and the corresponding amino 
acid sequence of, a megalomicin biosynthetic enzyme is disclosed in SEQ ID 
NO:l. Homologs (e.g., nucleic acids of the above-listed genes of species other 
than Micromonospora megalomicea) or other related sequences (e.g., paralogs) 
5 can be obtained by low, moderate or high stringency hybridization with all or a 
portion of the particular sequence provided as a probe using methods well known 
in the art for nucleic acid hybridization and cloning (e.g., as described in Section 
III) in accordance with the methods of the present invention. 

The megalomicin biosynthetic enzyme proteins, or domains thereof, of the 

10 present invention can be obtained by methods well known in the art for protein 
purification and recombinant protein expression in accordance with the methods 
of the present invention. For recombinant expression of one or more of the 
proteins, the nucleic acid containing all or a portion of the nucleotide sequence 
encoding the protein can be inserted into an appropriate expression vector, Le., a 

1 5 vector that contains the necessary elements for the transcription and translation of 
the inserted protein coding sequence. Transcriptional and translational signals can 
be supplied by the native promoter for a megalomicin biosynthetic gene and/or 
flanking regions. 

A variety of host-vector systems may be utilized to express the protein 
20 coding sequence. These include but are not limited to mammalian cell systems 
infected with virus (e.g. vaccinia virus, adenovirus, and the like); insect cell 
systems infected with virus (e.g. baculovirus); microorganisms such as yeast 
containing yeast vectors; or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their 
25 properties. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. 

In a specific embodiment, a vector is used that comprises a promoter 
operably linked to nucleic acid sequences encoding a megalomicin biosynthetic 
enzyme, or a domain, fragment, derivative or homolog, thereof, one or more 
30 origins of replication, and optionally, one or more selectable markers (e.g., an 
antibiotic resistance gene). 

Expression vectors containing the sequences of interest can be identified 
by three general approaches: (a) nucleic acid hybridization, (b) presence or 
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absence of "marker" gene function, and (c) expression of the inserted sequences. 
In the first approach, megalomicin biosynthetic nucleic acid sequences can be 
detected by nucleic acid hybridization to probes comprising sequences 
homologous and complementary to the inserted sequences. In the second 
5 approach, the recombinant vector/host system can be identified and selected based 
upon the presence or absence of certain "marker" functions (e.g., binding to an 
anti-megalomicin biosynthetic enzyme antibody, resistance to antibiotics, 
occlusion body formation in baculovirus, and the like) caused by insertion of the 
sequences of interest in the vector. For example, if a megalomicin biosynthetic 

10 gene, or portion thereof, is inserted within the marker gene sequence of the vector, 
recombinants containing the megalomicin biosynthetic gene fragment will be 
identified by the absence of the marker gene function. In the third approach, 
recombinant expression vectors can be identified by assaying for the megalomicin 
biosynthetic gene products expressed by the recombinant vector. Such assays can 

1 5 be based, for example, on the physical or functional properties of the interacting 
species in in vitro assay systems, e.g., megalomicin synthesis activity, 
immunoreactivity to antibodies specific for the protein. 

Once recombinant megalomicin biosynthetic genes or nucleic acids are 
identified, several methods known in the art can be used to propagate them in 

20 accordance with the methods of the present invention. Once a suitable host 
system and growth conditions have been established, recombinant expression 
vectors can be propagated and amplified in quantity. As previously described, the 
expression vectors or derivatives which can be used include, but are not limited to: 
human or animal viruses such as vaccinia virus or adenovirus; insect viruses such 

25 as baculovirus, yeast vectors; bacteriophage vectors such as lambda phage; and 
plasmid and cosmid vectors. 

In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies or processes the expressed proteins in the 
specific fashion desired. Expression from certain promoters can be elevated in the 

30 presence of certain inducers; thus expression of the genetically-engineered 

megalomicin biosynthetic enzymes may be controlled. Furthermore, different host 
cells have characteristic and specific mechanisms for the translational and post- 
translational processing and modification (e.g. glycosylation, phosphorylation, and 
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the like) of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the desired modification and processing of the foreign protein is achieved. 
For example, expression in a bacterial system can be used to produce an 
unglycosylated core protein, while expression in mammalian cells ensures 
5 "native" glycosylation of a heterologous protein. Furthermore, different 

vector/host expression systems may effect processing reactions to different extent. 

In particular, megalomicin biosynthetic enzyme derivatives can be made by 
altering their sequences by substitutions, additions or deletions that provide for 
functionally equivalent molecules. Due to the degeneracy of nucleotide coding 

10 sequences, other DNA sequences which encode substantially the same amino acid 
sequence as an megalomicin biosynthetic gene can be used in the practice of the 
present invention. These include but are not limited to nucleotide sequences 
comprising all or portions of megalomicin biosynthetic genes that are altered by fcg- 
the substitution of different codons that encode the amino acid residue within the 

15 sequence, thus producing a silent change. Likewise, the megalomicin biosynthetic 
enzyme derivatives of the invention include, but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid 
sequence of megalomicin biosynthetic enzymes, including altered sequences in 
which functionally equivalent amino acid residues are substituted for residues 

20 within the sequence resulting in a silent change. For example, one or more amino 
acid residues within the sequence can be substituted by another amino acid of a 
similar polarity which acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence may be selected 
from other members of the class to which the amino acid belongs. For example, 

25 the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan and methionine. The polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and 

30 glutamic acid. 

In a specific embodiment of the invention, the nucleic acids encoding 
proteins and proteins consisting of or comprising a domain or a fragment of 
megalomicin biosynthetic enzyme consisting of at least 6 (continuous) amino 
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acids are provided. In other embodiments, the domain or fragment consists of at 
least 10, 20, 30, 40, or 50 amino acids of a megalomicin biosynthetic enzyme. In 
specific embodiments, such domains or fragments are not larger than 35, 100 or 
200 amino acids. Derivatives .or analogs of megalomicin biosynthetic enzyme 
5 include but are not limited to molecules comprising regions that are substantially 
homologous to megalomicin biosynthetic enzyme in various embodiments, at least 
30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% identity over an amino acid 
sequence of identical size or when compared to an aligned sequence in which the 
alignment is done by a computer homology program known in the art in 

1 0 accordance with the methods of the present invention or whose encoding nucleic 
acid is capable of hybridizing to a sequence encoding a megalomicin biosynthetic 
enzyme under stringent, moderately stringent, or nonstringent conditions. 

The megalomicin biosynthetic enzyme domains, derivatives and analogs of 
the invention can be produced by various methods known in the art in accordance 

15 with the methods of the present invention. The manipulations which result in their 
production can occur at the gene or protein level. For example, the cloned 
megalomicin biosynthetic gene sequence can be modified by any of numerous 
strategies known in the art (Sambrook et al., 1990, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, 

20 New York) in accordance with the methods of the present invention. The 

sequences can be cleaved at appropriate sites with restriction endonuclease(s), 
followed by further enzymatic modification if desired, isolated, and Hgated in 
vitro. 

Additionally, the megalomicin biosynthetic enzyme-encoding nucleotide 
25 sequence can be mutated in vitro or in vivo 9 to create and/or destroy translation, 
initiation, and/or termination sequences, or to create variations in coding regions 
and/or form new restriction endonuclease sites or destroy pre-existing ones, to 
facilitate further in vitro modification. Any technique for mutagenesis known in 
the art can be used in accordance with the methods of the present invention, 
30 including but not limited to, chemical mutagenesis and in vitro site-directed 
mutagenesis (Hutchinson et al., J. Biol Chem. 253:6551-6558 (1978)), use of 
TAB® linkers (Pharmacia), and the like. 
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Once a recombinant cell expressing a megalomicin biosynthetic enzyme 
protein, or a domain, fragment or derivative thereof, is identified, the individual 
gene product can be isolated and analyzed. This is achieved by assays based on 
the physical and/or functional properties of the protein, including, but not limited 
5 to, radioactive labeling of the product followed by analysis by gel electrophoresis, 
immunoassay, cross-linking to marker-labeled product, and the like. 

The megalomicin biosynthetic enzyme proteins may be isolated and 
purified by standard methods known in the art or recombinant host cells 
expressing the complexes or proteins in accordance with the methods of the 

10 invention, including but not restricted to column chromatography (eg., ion 

exchange, affinity, gel exclusion, reversed-phase high pressure, fast protein liquid, 
and the like), differential centrifugation, differential solubility, or by any other 
standard technique used for the purification of proteins. Functional properties 
may be evaluated using any suitable assay known in the art in accordance with the 

1 5 methods of the present invention. 

Alternatively, once a megalomicin biosynthetic enzyme or its domain or 
derivative is identified, the amino acid sequence of the protein can be deduced 
from the nucleotide sequence of the gene which encodes it. As a result, the 
protein or its domain or derivative can be synthesized by standard chemical 

20 methods known in the art in accordance with the methods of the present invention 
(see Hunkapiller et al, Nature 310:105-1 1 1 (1 984)). 

Manipulations of megalomicin biosynthetic enzymes may be made at the 
protein level. Included within the scope of the invention are megalomicin 
biosynthetic enzyme domains, derivatives or analogs or fragments, which are 

25 differentially modified during or after translation, e.g., by glycosylation, 
acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule 
or other cellular ligand, and the like. Any of numerous chemical modifications 
may be carried out by known techniques, including but not limited to specific 

30 chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 
protease, NaBRi, acetylation, formylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, and the like. 
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In specific embodiments, the megalomicin biosynthetic enzymes are 
modified to include a fluorescent label. In other specific embodiments, the 
megalomicin biosynthetic enzyme is modified to have a heteroftinctional reagent, 
such heteroftinctional reagents can be used to crosslink the members of the 
5 complex. 

In addition, domains, analogs and derivatives of a megalomicin 
biosynthetic enzyme can be chemically synthesized. For example, a peptide 
corresponding to a portion of a megalomicin biosynthetic enzyme, which 
comprises the desired domain or which mediates the desired activity in vitro can 

10 be synthesized by use of a peptide synthesizer. Furthermore, if desired, 

nonclassical amino acids or chemical amino acid analogs can be introduced as a 
substitution or addition into the megalomicin biosynthetic enzyme sequence. 
Non-classical amino acids include but are not limited to the D-isomers of the m 
common amino acids, alpha-amino isobutyric acid, 4-aminobutyric acid, 

1 5 2-aminobutyric acid, 6-amino hexanoic acid, Aib, 2-amino isobutyric acid, 
3-amino propionoic acid, ornithine, norleucine, norvaline, hydroxyproline, 
sarcosine, citrulline, cysteic acid, t-butylglycine, t-butylalanine, phenylglycine, 
cyclohexylalanine, B-alanine, fluoro-amino acids, designer amino acids such as B- 
methyl amino acids, Ca-methyl amino acids, Na-methyl amino acids, and amino 

20 . acid analogs in general. Furthermore, the amino acid can be D (dextrorotary) or L 
(levorotary). 

In cases where natural products are suspected of being mutant or are 
isolated from new species, the amino acid sequence of the megalomicin 
biosynthetic enzyme isolated from the natural source, as well as those expressed in 

25 vitro, or from synthesized expression vectors in vivo or in vitro, can be determined 
from analysis of the DN A sequence, or alternatively, by direct sequencing of the 
isolated protein. Such analysis may be performed by manual sequencing or 
through use of an automated amino acid sequenator. 

The megalomicin biosynthetic enzyme proteins may also be analyzed by 

30 hydrophilicity analysis (Hopp and Woods, Proc. Nati Acad Sci USA 78:3824- 
3828 (1981)). A hydrophilicity profile can be used to identify the hydrophobic 
and hydrophilic regions of the proteins, and help predict their orientation in 
designing substrates for experimental manipulation, such as in binding 
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experiments, antibody synthesis, and the like. Secondary structural analysis can 

also be done to identify regions of the megalomicin biosynthetic enzyme that 

assume specific structures (Chou and Fasman, Biochemistry 13:222-23 (1974)). 

Manipulation, translation, secondary structure prediction, hydrophilicity and 
5 hydrophobicity profiles, open reading frame prediction and plotting, and 

determination of sequence homologies, can be accomplished using computer 

software programs available in the art. 

Other methods of structural analysis including but not limited to X-ray 

crystallography (Engstrom, Biochem. Exp. Biol..U:7-\3 (1974)), mass 
10 spectroscopy and gas chromatography (Methods in Protein Science, J. Wiley and 

Sons, New York, 1997), and computer modeling (Fletterick and Zoller, eds., 1986, 

Computer Graphics and Molecular Modeling, In; Current Communications in 

Molecular Biology, Cold Spring Harbor Laboratory, Gold Spring Harbor Press, 

New York) can also be employed. 
1 5 The invention also provides an antibody, or a fragment or derivative 

thereof, which immuno-specifically binds to a domain of megalomicin polyketide 

synthase (PKS) or a megalomicin modification enzyme. In a specific 

embodiment, an antibody which immuno-specifically binds to a domain of the 

megalomicin biosynthetic enzyme encoded by a nucleic acid that hybridizes to a 
20 nucleic acid having the nucleotide sequence set forth in the SEQ. ID NO: 1 , or a 

fragment or derivative of said antibody containing the binding domain thereof is 

provided. Preferably, the antibody is a monoclonal antibody. 

The megalomicin biosynthetic enzyme protein and domains, fragments, 

homologs and derivatives thereof may be used as irnmunogens to generate 
25 antibodies which immunospecifically bind such irnmunogens. Such antibodies 

include but are not limited to polyclonal, monoclonal, chimeric, single chain, Fab 

fragments, and an Fab expression library. 

Various procedures known in the art may be used for the production of 

polyclonal antibodies to a megalomicin biosynthetic enzyme protein of the 
30 invention, its domains, derivatives, fragments or analogs in accordance with the 

methods of the present invention. 

For production of the antibody, various host animals can be immunized by 

injection with the native megalomicin biosynthetic enzyme protein or a synthetic 
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version, or a derivative of the foregoing, such as a cross-linked megalomicin 
biosynthetic enzyme. Such host animals include but are not limited to rabbits, 
mice, rats, and the like. Various adjuvants can be used to increase the 
immunological response, depending on the host species, and include but are not 
5 limited to Freund's (complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human 
adjuvants such as bacille Calmette-Guerin (BCG) and corynebacterium parvum. 
For preparation of monoclonal antibodies directed towards a megalomicin 

1 0 biosynthetic enzyme or domains, derivatives, fragments or analogs thereof, any 
technique that provides for the production of antibody molecules by continuous 
cell lines in culture may be used. Such techniques include but are not restricted to 
the hybridoma technique originally developed by Kohler.and Milstein {Nature 
256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique 

1 5 (Kozbor et al. , Immunology Today 4: 72 (1 983)), and the EBV hybridoma 

technique to produce human monoclonal antibodies (Cole et al., in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 (1985)). In an 
additional embodiment, monoclonal antibodies can be produced in germ-free 
animals (WO89/12690). Human antibodies may be used and can be obtained by 

20 using human hybridomas (Cote et al., Proc. Natl Acad. Sci. USA 80:2026-2030 
(1983)) or by transforming human B cells with EBV virus in vitro (Cole et al., in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 
(1985)). Techniques developed for the production of "chimeric antibodies" 
(Morrison et al., Proc. Natl. Acad Sci. USA 81:6851-6855 (1984); Neuberger et 

25 . al., Nature 312:604-608 (1984); Takeda et al., Nature 314:452-454 (1985)) by 
splicing the genes from a mouse antibody molecule specific for the megalomicin 
biosynthetic enzyme protein together with genes from a human antibody molecule 
of appropriate biological activity can be used; such antibodies are within the scope 
of this invention. 

30 Techniques described for the production of single chain antibodies (U.S. 

patent 4,946,778) can be adapted to produce megalomicin biosynthetic enzyme- 
specific single chain antibodies. An additional embodiment utilizes the techniques 
described for the construction of Fab expression libraries (Huse et al., Science 
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246:1275-1281 (1989)) to allow rapid and easy identification of monoclonal Fab 

fragments with the desired specificity for megalomicin biosynthetic enzyme, or 

domains, derivatives, or analogs thereof. Non-human antibodies can be 

"humanized" by known methods (see, e.g., U.S. Patent No. 5,225,539). 
5 Antibody fragments that contain the idiotypes of a megalomicin 

biosynthetic enzyme can be generated by techniques known in the art in 

accordance with the methods of the present invention. For example, such 

fragments include but are not limited to: the F(ab')2 fragment which can be 

produced by pepsin digestion of the antibody molecule; the Fab' fragments that 
10 can be generated by reducing the disulfide bridges of the F(ab')2 fragment, the Fab 

fragments that can be generated by treating the antibody molecular with papain 

and a reducing agent, and Fv fragments. 

In the production of antibodies, screening for the desired antibody can be ch: 

accomplished by techniques known in the art in accordance with the methods of 
1 5 the present invention, e.g. , ELISA (enzyme-linked immunosorbent assay). To 

select antibodies specific to a particular domain of the megalomicin biosynthetic 

enzyme, one may assay generated hybridomas for a product that binds to the 

fragment of a megalomicin biosynthetic enzyme that contains such a domain. 

The foregoing antibodies can be used in methods known in the art relating 
20 to the localization and/or quantitation of megalomicin biosynthetic enzyme 

proteins, e.g ., for imaging these proteins or measuring levels thereof in samples, in 

accordance with the methods of the present invention. 

Section V: Heterologous Expression of the Megalomicin Biosynthetic Genes 
25 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the megalomicin biosynthetic genes 
and recombinant DNA expression vectors useful in the method. For purposes of 
the invention, any host cell other than Micromonospora megalomicea is a 
heterologous host cell. Thus, included within the scope of the invention in 
30 addition to isolated nucleic acids encoding domains, modules, or proteins of the 
megalomicin PKS and modification enzymes, are recombinant expression vectors 
that include such nucleic acids. The term expression vector refers to a nucleic acid 
that can be introduced into a host cell or cell-free transcription and translation 
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system. An expression vector can be maintained permanently or transiently in a 
cell, whether as part of the chromosomal or other DNA in the cell or in any 
cellular compartment, such as a replicating vector in the cytoplasm. An expression 
vector also comprises a promoter that drives expression of an RNA, which 
5 typically is translated into a polypeptide in the cell or cell extract. For efficient 
translation of RNA into protein, the expression vector also typically contains a 
ribosome-binding site sequence positioned upstream of the start codon of the 
coding sequence of the gene to be expressed. Other elements, such as enhancers, 
secretion signal sequences, transcription termination sequences, and one or more 

1 0 marker genes by which host cells containing the vector can be identified and/or 
selected, may also be present in an expression vector. Selectable markers, i.e., 
genes that confer antibiotic resistance or sensitivity, are preferred and confer a 
selectable phenotype on transformed cells when the cells are grown in an 
appropriate selective medium. 

1 5 The various components of an expression vector can vary widely, 

depending on the intended use of the vector and the host cell(s) in which the 
vector is intended to replicate or drive expression. Expression vector components 
suitable for the expression of genes and maintenance of vectors in E. coli, yeast, 
Streptomyces, and other commonly used cells are widely known and commercially 

20 available. For example, suitable promoters for inclusion in the expression vectors 
of the invention include those that function in eucaryotic or procaryotic host cells. 
Promoters can comprise regulatory sequences that allow for regulation of 
expression relative to the growth of the host cell or that cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus. For E. 

25 coli and certain other bacterial host cells, promoters derived from genes for 
biosynthetic enzymes, antibiotic-resistance conferring enzymes, and phage 
proteins can be used and include, for example, the galactose, lactose (lac), 
maltose, tryptophan (trp) y beta-lactamase (bla), bacteriophage lambda PL, and T5 
promoters. In addition, synthetic promoters, such as the tac promoter (U.S. Patent 

30 No. 4,55 1 ,433), can also be used. 

Thus, recombinant expression vectors contain at least one expression 
system, which, in turn, is composed of at least a portion of the megalomicin PKS 
and/or other megalomicin biosynthetic gene coding sequences operably linked to a 
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promoter and optionally termination sequences that operate to effect expression of 
the coding sequence in compatible host cells. The host cells are modified by 
transformation with the recombinant DNA expression vectors of the invention to 
contain the expression system sequences either as extrachromosomal elements or 
5 integrated into the chromosome. The resulting host cells of the invention are 

useful in methods to produce PKS and post-PKS modification enzymes as well as 
polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for 
expression vectors of the present invention include fungal host cells such as yeast 

10 and procaryotic host cells such as E. colt and Streptomyces, but mammalian host 
cells can also be used. In hosts such as yeasts, plants, or mammalian cells that 
ordinarily do not produce polyketides, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinantly produced PKS to functionality. Provision of such enzymes is 

15 described, for example, in PCT publication Nos. WO 97/13845 and 98/27203, 
each of which is incorporated herein by reference. Particularly preferred host cells 
for purposes of the present invention are Streptomyces and Saccharopolyspora 
host cells, as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are 

20 used to construct a heterologous recombinant Streptomyces host cell that expresses 
a recombinant PKS of the invention. Streptomyces is a convenient host for 
expressing polyketides, because polyketides are naturally produced in certain 
Streptomyces species, and Streptomyces cells generally produce the precursors 
needed to form the desired polyketide. Those of skill in the art will recognize that, 

25 if a Streptomyces host cell produces any portion of a PKS enzyme or produces a 
polyketide modification enzyme, the recombinant vector need drive expression of 
only those genes constituting the remainder of the desired PKS enzyme or other 
polyketide-modifying enzymes. Thus, such a vector may comprise only a single 
ORF, with the desired remainder of the polypeptides constituting the PKS 

30 provided by the genes on the host cell chromosomal DNA. 

If a Streptomyces or other host cell ordinarily produces polyketides, it may 
be desirable to modify the host so as to prevent the production of endogenous 
polyketides prior to its use to express a recombinant PKS of the invention. Such 
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modified hosts include 5. coelicolor CH999 and similarly modified S. lividans 
described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 95/08548 
and WO 96/40968, incorporated herein by reference. In such hosts, it may not be 
necessary to provide enzymatic activities for all of the desired post-translational 
5 modifications of the enzymes that make up the recombinantly produced PICS, 
because the host naturally expresses such enzymes. In particular, these hosts 
generally contain holo-ACP synthases that provide the phosphopantotheinyl 
residue needed for functionality of the PKS. 

The invention provides a wide variety of expression vectors for use in 

10 Strepiomyces. The replicating expression vectors of the present invention include, 
for example and without limitation, those that comprise an origin of replication 
from a low copy number vector, such as SCP2* (see Hopwood et al. 9 Genetic 
**$ Manipulation of Streptomyces: A Laboratory manual (The John Innes -Foundation, 

Norwich, U.K., 1985); Lydiate et al. y 1985, Gene 35: 223-235; and Kieser and 

1 5 Melton, 1 988, Gene 65: 83-91 , each of which is incorporated herein by reference), 
SLP1.2 (Thompson et al. 9 1982, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ts) (Muth et a/., 1989, Mol Gen. Genet 219: 341-348, and 
Bierman et ah, 1992, Gene 116: 43-49, each of which is incorporated herein by 
reference), or a high copy number vector, such as pi J 101 and pJVl (see Katz et 

20 aU 1983, J, Gen. Microbiol 129: 2703-2714; Vara et al, 1989, 1 Bacterioi 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131-140, each of which is 
incorporated herein by reference). For non-replicating and integrating vectors and 
generally for any vector, it is useful to include at least an E. coli origin of 
replication, such as from pUC, plP, pi I, and pBR. For phage based vectors, the 

25 phage phiC3 1 and its derivative KC5 1 5 can be employed (see Hopwood et al., 
supra). Also, plasmid pSET152, plasmid pSAM, plasmids pSElOl and pSE21 1, 
all of which integrate site-specifically in the chromosomal DNA of & lividans, can 
be employed for purposes of the present invention. 

The Streptomyces recombinant expression vectors of the invention 

30 typically comprise one or more selectable markers, including antibiotic resistance 
conferring genes selected from the group consisting of the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to 
thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
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(confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. Alternatively, several polyketides are 
naturally colored, and this characteristic can provide a built-in marker for 
5 identifying cells. 

Megalomicins are currently produced only by the relatively genetically 
intractable host Micromonospora megalomicinea. This bacteria has not been 
commonly used in the fermentation industry for the large-scale production of 
antibiotics, and methods for high level production of megalomicin and its analogs 

10 are needed. In contrast, the streptomycete bacteria have been widely used for 
almost 50 years and are excellent hosts for production of megalomicin and its 
analogs. Streptomyces lividans and S. coelicolor have been developed for the 
:> .expression of heterologous PKS systems. These organisms can stably maintain,,.? 
cloned heterologous PKS genes, express them at high levels under controlled 

1 5 conditions, and modify the corresponding PKS proteins (e.g., 

phosphopantotheinylation) so that they are capable of production of the polyketide 
they encode. Furthermore, these hosts contain the necessary pathways to produce 
the substrates required for polyketide synthesis; e.g. propionyl-CoA and 
methylmalonyl-CoA. A wide variety of cloning and expression vectors are 

20 available for these hosts, as are methods for the introduction and stable 

maintenance of large segments of foreign DNA. Relative to Micromonospora spp., 
S. lividans and S. coelicolor grow well on a number of media and have been 
adapted for high level production of polyketides in fermentors. If production levels 
are low, a number of rational approaches are available to improve yield (see 

25 Hosted and Baltz, 1 996, Trends Biotechnol. 74(7):245-50, incorporated herein by 
reference). Empirical methods to increase the titers of these macrolides, long since 
proven effective for numerous bacterial polyketides, can also be employed. 

Preferred Streptomyces host cell/vector combinations of the invention 
include S. coelicolor CH999 and S. lividans K4-1 14 host cells, which have been 

30 modified so as not to produce the polyketide actinorhodin, and expression vectors 
derived from the pRMl and pRM5 vectors, as described in U.S. Patent Nos. 
5,830,750 and 6,022,731 and U.S. patent application Serial No. 09/181,833, filed 
28 Oct. 1 998, each of which is incorporated herein by reference. These vectors are 
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particularly preferred in that they contain promoters compatible with numerous 
and diverse Streptomyces spp. Particularly useful promoters for Streptomyces host 
cells include those from PKS gene clusters that result in the production of 
polyketides as secondary metabolites, including promoters from aromatic (Type II) 
5 PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene 
promoters and tern gene promoters; an example of a Type I PKS gene cluster 
promoter are the promoters of the spiramycin PKS genes and DEBS genes. The 
present invention also provides the megalomicin biosynthetic gene promoters in 
recombinant form. These promoters can be used to drive expression of the 

1 0 megalomicin biosynthetic genes or any other coding sequence of interest in host 
cells in which the promoter functions, particularly Micromonospora megalomicea 
and generally any Streptomyces species. 

As described above, particularly useful control sequences are those that 
alone or together with suitable regulatory systems activate expression during 

1 5 transition from growth to stationary phase in the vegetative mycelium. The 
promoter contained in the aforementioned plasmid pRM5, i.e., the actl/actHI 
promoter pair and the actII-ORF4 activator gene, is particularly preferred. Other 
useful Streptomyces promoters include without limitation those from the ermE 
gene and the melCl gene, which act constitutively, and the tip A gene and the merA 

20 gene, which can be induced at any growth stage. In addition, the T7 RNA 

polymerase system has been transferred to Streptomyces and can be employed in 
the vectors and host cells of the invention. In this system, the coding sequence for 
the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a 
vector under the control of the inducible merA promoter, and the gene of interest is 

25 placed under the control of the T7 promoter. As noted above, one or more 
activator genes can also be employed to enhance the activity of a promoter. 
Activator genes in addition to the actII-ORF4 gene described above include dnrl, 
redD, and ptpA genes (see U.S. patent application Serial No. 09/181,833, supra). 
To provide a preferred host cell and vector for purposes of the invention, 

30 the megalomicin biosynthetic genes are placed on a recombinant expression vector 
and transferred to the non-macrolide producing hosts Streptomyces lividans K4- 
1 14 and S. coelicolor CH999. Transformation of S lividans K4-1 14 or S 
coelicolor CH999 with this expression vector results in a strain which produces 
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detectable amounts of megalomicin as determined by analysis of extracts by 
LC/MS. As noted above, the present invention also provides recombinant DNA 
compounds in which the encoded megalomicin module I KS domain is 
inactivated (the KS1° mutation). The introduction into Streptomyces lividans or SI 
5 coelicolor of a recombinant expression vector of the invention that encodes a 
megalomicin PKS with a KS1° domain produces a host cell useful for making 
polyketides by a process known as diketide feeding. The resulting host cells can be 
fed or supplied with N-acylcysteamine thioesters of precursor molecules to 
prepare megalomicin derivatives. Such cells of the invention are especially useful 

1 0 in the production of 1 3-substituted-6-deoxyerythronolide B compounds in 
recombinant host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. In a preferred embodiment, the meg PKS 
is produced from a recombinant construct in which the megAIII gene has been 

1 5 altered to abolish the regions of identical coding sequence it otherwise shares with 
the megAI gene, or a hybrid PKS is employed in which the megAIII gene product 
has been replaced by the oleAIII gene product. Recombinant oleAIII genes are 
described in, for example, PCT patent publication No. 00/026349 and U.S. patent 
application Serial No. 09/428,517, filed 28 Oct. 1999, both of which are 

20 incorporated herein by reference. 

The recombinant host cells of the invention can express all of the 
megalomicin biosynthetic genes or only a subset of the same. For example, if only 
the genes for the megalomicin PKS are expressed in a host cell that otherwise does 
not produce polyketide modifying enzymes that can act on the polyketide 

25 produced, then the host cell produces unmodified polyketides, called macrolide 
aglycones. Such macrolide aglycones can be hydroxylated and glycosylated by 
adding them to the fermentation of a strain such as, for example, Streptomyces 
antibioticus or Saccharopolyspora erythraea, that contains the requisite 
modification enzymes. 

30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, 
useful activities. For example, as shown in Figure 5, Saccharopolyspora erythraea 
can convert 6-dEB to a variety of useful compounds. The erythronolide 6-dEB is 
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converted by the eryF gene product to erythronolide B, which is, in turn, 
glycosylated by the eryBV gene product to obtain 3-O-mycarosylerythronolide B, 
which contains L-mycarose at C-3. The eryCIJI gene product then converts this 
compound to erythromycin D by glycosylation with D-desosamine at C-5. 
5 Erythromycin D, therefore, differs from 6-dEB through glycosylation and by the 
addition of a hydroxyl group at C-6. Erythromycin D can be converted to 
erythromycin B in a reaction catalyzed by the eryG gene product by methylating 
the L-mycarose residue at C-3. Erythromcyin D is converted to erythromycin C by 
the addition of a hydroxyl group at C- 12 in a reaction catalyzed by the eryK gene 

10 product. Erythromycin A is obtained from erythromycin C by methylation of the 
mycarose residue in a reaction catalyzed by the eryG gene product. The 
unmodified megalomicin compounds provided by the present invention, such as, 
for example, the 6-dEB or 6-dEB analogs, produced in Streptomyces lividans, can 
be provided to cultures of S. erythraea and converted to the corresponding 

1 5 derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the examples below. To ensure that only the desired compound is 
produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et ai, 1985, J. 
BacterioL 164{\): 425-433). Also, one can employ other mutant strains, such as 

20 eryB 9 eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 

multiple genes, to accumulate a preferred compound. The conversion can also be 
carried out in large fermentors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described 

25 above, the organisms can be mutants unable to produce the polyketide normally 
produced in that organism, the fermentation can be carried out on plates or in large 
fermentors, and the compounds produced can be chemically altered after 
fermentation. Thus, Streptomyces venezuelae, which produces picromycin, 
contains enzymes that can transfer a desosaminyl group to the C-5 hydroxyl and a 

30 hydroxyl group to the C- 12 position. In addition, 5. venezuelae contains a 

glucosylation activity that glucosylates the 2' -hydroxyl group of the desosamine 
sugar. This latter modification reduces antibiotic activity, but the glucosyl residue 
is removed by enzymatic action prior to release of the polyketide from the cell. 
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Another organism, 5. narbonensis, contains the same modification enzymes as S. 
venezuelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. narbonensis 
5 and S. venezuelae. 

Other organisms suitable for making compounds of the invention include 
Micromonospora megalomicea (discussed above), Streptomyces antibioticus, S. 
fradiae, and S. thermotolerans. S. antibioticus produces oleandomycin and 
contains enzymes that hydroxylate the C-6 and C-12 positions, glycosylate the C-3 

10 hydroxyl with oleandrose and the C-5 hydroxyl with desosamine, and form an 
epoxide at C-8-C-8a. S. fradiae contains enzymes that glycosylate the C-5 
hydroxyl with mycaminose and then the 4'-hydroxyl of mycaminose with 
mycarose, forming a disaccharide. S. thermotolerans contains the sameactivities 
as S. fradiae, as well as acylation activities. Thus, the present invention provides 

15 the compounds produced by hydroxylation and glycosylation of the macrolide 

aglycones of the invention by action of the enzymes endogenous to S. aniibioticus, 
S. fradiae, and S. thermotolerans. 

The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 

20 directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAI, megAII, and megAIII genes with one or more 
deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene (as discussed in the next Section), 
can be included on expression vectors suitable for expression of the encoded gene 

25 products in Saccharopolyspora erythraea, Streptomyces antibioticus, S. 

venezuelae, S. narbonensis, Micromonospora megalomicea, S. fradiae, and S. 
thermotolerans. 

A number of erythromycin high-producing strains of Saccharopolyspora 
erythraea and Streptomyces fradiae have been developed, and in a preferred 
30 embodiment, the megalomicin PKS and/or other megalomicin biosynthetic genes 
are introduced into such strains (or erythromycin non-producing mutants thereof) 
to provide the corresponding modified megalomicin compounds in high yields. 
Those of skill in the art will appreciate that S. erythraea contains the desosamine 
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and mycarose biosynthetic and transfer genes as well as DEBS, which, as noted 
above, makes the same macrolide aglycone, 6-dEB, as the megalomicin PKS. S. 
erythraea does not make megosamine or its corresponding transferase gene, and 
does not contain the acylation gene of Micromonospora megalomicea. Finally, the 
5 5. erythraea eryG gene product converts mycarose to cladinose, which does not 
occur in Af. megalomicea. Thus, the present invention provides a wide variety of 
S. erythraea recombinant host cells, including, for example, those that contain: 

(i) wild-type erythromycin biosynthetic genes with recombinant 
megosamine biosynthetic and transfer genes, with and without megalomicin 

10 acylation genes; 

(ii) wild-type erythromycin biosynthetic genes except eryG, with 
recombinant megosamine biosynthetic and transfer genes, with and without 
megalomicin acylation genes; and 

(iii) as in (i) and (ii), except that the eryA genes are inactive or deleted and 
15 recombinant megA genes have been introduced. 

The invention provides other S. erythraea strains as well, including those 
in which any one or more of the erythromycin biosynthetic genes have been 
deleted or otherwise rendered inactive and in which at least one megalomicin 
biosynthetic gene has been introduced. 

20 For example, the present invention enables one to express the megosamine 

genes in a Saccharopolyspora erythraea eryG mutant in which the erythromycin C 
made by this mutant is converted to megalomicin A. Alternatively, one could use 
an erythromycin C high -producing strain of & erythraea in biotransformation 
methods in which the erythromycin C is fed to a Streptomyces lividans strain 

25 carrying only the megosamine biosynthesis and glycosyltransferase genes. As 
another alternative, one could use a strain of S. lividans that carries suitable 
erythromycin production genes along with the daunosamine biosynthesis genes 
plus geneX and geneY of Figure 5, or all of the megosamine biosynthesis genes, to 
produce megalomicin A. 

30 All or some of the megalomicin gene cluster can be easily cloned under 

control of a suitable promoter in pCK7 or pSET 1 52 either in one or two plasmids 
and introduced into the Saccharopolyspora erythraea eryG mutant. The actll- 
ORF4/tfc//p system and the phiC31//w/ system in pSET function well in this 
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organism (see Rowe et aL, 1998, Gene, 216:215-23, incorporated herein by 
reference). Alternatively, the megosamine biosynthesis genes are introduced into 
Streptomyces lividans on the same plasmids and the production of megalomicin A 
or its precursor mediated by byconversion, done by feeding erythronolide B, 3- 
5 alpha-mycarosylerythronolide B, erythromycin D or erythromycin C to the S. 
lividans strain. 

Lack of adequate resistance to megalomicin A in S. erythraea or S. 
lividans is not expected, because both organisms have MLS resistance genes 
(ermE and mgt/lrm y respectively), which confer resistance to several 14-membered 

10 macrolides (see Cundliffe, 1989, Annu. Rev. Microbiol. 45:207-33; Jenkins and 
Cundliffe, 1991, Gene 705:55-62; and Cundliffe, 1992, Gene, 7/5:75-84, each of 
which is incorporated herein by reference). One can also readily determine the 
level of resistance of the S. erythraea eryG mutant and the S. lividans host cells to - . 
megalomicin A, both in plate tests and in liquid medium. One can repeat the 

1 5 bioconversion method using an eryG mutant of a high erythromycin A producing 
S. erythraea strain (or an eryB or eryC mutant, as necessary) to determine the level 
at which megalomicin A can be produced. Furthermore, if experience shows that 
high level megalomicin A production requires a higher level of resistance to this 
macrolide than present in S. erythraea or S. lividans, the necessary megalomicin 

20 self-resistance genes will be cloned from M. megalomicea and moved into either 
one of the heterologous hosts. This will be straightforward work since self- 
resistance genes are usually found in the cluster of macrolide biosynthesis genes 
and can be identified by their homology to known macrolide resistance genes 
and(or) by the resistance phenotype they impart to a strain that normally is 

25 sensitive. 

Alternatively, geneX and geneY (Figure 5) can be added to cassettes 
containing the relevant daunosamine (dnm) biosynthesis genes (Figure 5) to 
provide the ability to make TDP-megosamine in vivo and attach it to an 
erythromycin algycone. The TDP-daunosamine biosynthesis genes can be re- 
30 cloned from Streptomyces peucetius on two compatible and mutually selectable 
plasmids. When an S. lividans strain containing these two plasmids and the dnmS 
gene for TDP-daunosamine glycosyltransferase is grown in the presence of added 
epsilon-rhodomycinone, its glycoside with L-daunosamine, called rhodomycin D, 
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is produced in good yield. Thus, biocon version of one of the erythromycins to 
megalomicin A should be observed when geneX and gene Y are present. One can 
construct all five combination - the two Af-dimethyltransferase genes and the three 
glycosyltransferase genes - to discriminate geneX and geneY from those connected 
5 with mycarose and desosamine biosynthesis and attachment in the megalomicin 
pathway. 

Because the timing of megosamine addition is unknown, one can test 
erythronolide B, 3-alpha-mycarosylerythronolide B, erythromycin D and 
erythromycin C as substrates provided to a strain that expresses the megosamine 
1 0 biosynthetic and transferase genes. There is need to test the C3" 5 and(or) C4' " 
acylated metabolites like megalomicin CI, because these metabolites are made 
from megalomicin A and not the converse, based on the precedents in the 
biosynthesis of tylosin (see Arisawa a/., 1994, Appl. Environ. Microbiol 60: 
2657-2661), carbomycin (see Epp et al, 1989, Gene 55:293-301), and 
1 5 midecamycin (see Hara and Hutchinson, 1 992, J. Bacteriol 1 74, 5 1 4 1 -5 1 44). If 
C-6 glycosylation of erythronolide B or 3-alpha-mycarosylerythronolide B (Figure 
5) happens before addition of desosamine to C-5, then the erythromycin genes 
might not be able to complete formation of megalomicin A from some mono or 
diglycoside if the erythromycin glycosyltransferases cannot tolerate a C-6 
20 glycoside. Although unexpected, such an outcome could be circumvented in 
accordance with the methods of the invention by cloning further megalomicin 
biosynthesis genes into the appropriate 5. erythraea background or into 5. lividans 
- specifically, the necessary deoxysugar biosynthesis and attachment genes - to 
create a recombinant strain that produces megalomicin A. 

The acyltransferase gene that adds acetate or propionate to the C3"' or 
C4'" positions of mycarose in megalomicin B, CI and C2 (Figure 3) is contained 
within the cosmids of the invention and can be identified by scanning the sequence 
data for the megalomicin gene cluster to locate homologs of carE and mdmB or 
their acyA homologs from the tylosin producer. The carE and acyA genes govern 
C4'" acylation in the carbomycin and tylosin pathway, respectively. The 
megalomicin homolog has the equivalent function in megalomicin biosynthesis 
(but is specific for C3'" and C4'" acylation). The gene can be cloned under 
control of a suitable promoter and introduced into S. lividans to produce the 
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desired acyl derivative of megalomicin A. Alternatively, introduction of the carE 
gene can form megalomicin B. This gene can be cloned from the carbomycin, 
spiramycin or tylosin producers. 

If the amount of megalomicin produced by an S. erythraea or S. lividans or 
5 other recombinant host cell is less than desired, yield can be improved by 
optimizing the growth medium and fermentation conditions, by increasing 
expression of the gene(s) that appear to be rate limiting, based on the level of 
pathway intermediates that are accumulated by the recombinant strain constructed, 
and by reconstructing the ery, dnm, and megalomicin biosynthesis genes on 

10 vectors like pSET152 that can be integrated into the genome to provide a stabler 
recombinant strain for strain improvement. 

In another embodiment, the present invention provides recombinant 
vectors encoding one or more of the megosamine, desosamine, and mycarose 
biosynthetic and transfer genes and heterologous host cells comprising those 

1 5 vectors. In this embodiment of the invention, the heterologous host cell is typically 
a cell that is unable to produce the sugar and transfer it to a polyketide unless the 
vector of the invention is introduced. For example, neither Streptomyces lividans 
nor S. coelicolor is naturally capable of making megosamine, desosamine, or 
mycarose or transferring those moieties to a polyketide. However, the present 

20 invention provides recombinant Streptomyces lividans and S. coelicolor host cells 
that are capable of making megosamine, desosamine, and/or mycarose and 
transferring those moieties to a polyketide. 

Moreover, additional recombinant gene products can be expressed in the 
host cell to improve production of a desired polyketide. As but one non-limiting 

25 example, certain of the recombinant PKS proteins of the invention may produce a 
polyketide other than or in addition to the predicted polyketide, because the 
polyketide is cleaved from the PKS by the thioesterase (TE) domain in module 6 
prior to processing by other domains on the PKS, in particular, any KR, DH, 
and/or ER domains in module 6. The production of the predicted polyketide can 

30 be increased in such instances by deleting the TE domain coding sequences from 
the gene and, optionally, expressing the TE domain as a separate protein. See 
Gokhale et ai, Feb. 1999, "Mechanism and specificity of the terminal thioesterase 
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domain from the erythromycin polyketide synthase," Chem. & Bioi 6: 1 17-125, 
incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
5 megalomicin and hydroxylated and glycosylated derivatives of megalomicin in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the megalomicin PKS or other 
biosynthetic genes, as described in the following Section. 

10 Section VI: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding 
each of the domains of each of the modules of the megalomicin PKS as well as the 
other megalomicin biosynthetic enzymes. The availability of these compounds 
permits their use in recombinant procedures for production of desired portions of 

1 5 the megalomicin PKS fused to or expressed in conjunction with all or a portion of 
a heterologous PKS and, optionally, one or more polyketide modification 
enzymes. These compounds also permit the modification of polyketides with the 
various megalomicin modification enzymes. The resulting hybrid PKS can then be 
expressed in a host cell to produce a desired polyketide or modified form thereof. 

20 Thus, in accordance with the methods of the invention, a portion of the 

megalomicin biosynthetic gene coding sequence that encodes a particular activity 
can be isolated and manipulated, for example, to replace the corresponding region 
in a different modular PKS gene or modification enzyme gene. In addition, coding 
sequences for individual proteins, modules, domains, and portions thereof of the 

25 megalomicin PKS can be ligated into suitable expression systems and used to 
produce the portion of the protein encoded. The resulting protein can be isolated 
arid purified or can may be employed in situ to effect polyketide synthesis. 
Depending on the host for the recombinant production of the domain, module, 
protein, or combination of proteins, suitable control sequences such as promoters, 

30 termination sequences, enhancers, and the like are ligated to the nucleotide 

sequence encoding the desired protein in the construction of the expression vector, 
as described above. 
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In one important embodiment, the invention thus provides hybrid PKS 
enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a 
recombinant PKS that comprises all or part of one or more extender modules, 
5 loading module, and/or thioesterase/cyclase domain of a first PKS and all or part 
of one or more extender modules, loading module, and/or thioesterase/cyclase 
domain of a second PKS. In one preferred embodiment, the first PKS is most but 
not all of the megalomicin PKS, and the second PKS is only a portion of a non- 
megalomicin PKS. An illustrative example of such a hybrid PKS includes a 

10 megalomicin PKS in which the megalomicin PKS loading module has been 

replaced with a loading module of another PKS. Another example of such a hybrid 
PKS is a megalomicin PKS in which the AT domain of extender module 3 is 
replaced with an AT domain that binds only malonyl CoA. In another preferred & 
embodiment, the first PKS is most but not all of a non-megalomicin PKS, and the 

15 second PKS is only a portion of the megalomicin PKS. An illustrative example of, 
such a hybrid PKS includes a rapamycin PKS in which an AT specific for malonyl 
CoA is replaced with the AT from the megalomicin PKS specific for 
methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are described 
below. 

20 Those of skill in the art will recognize that all or part of either the first or 

second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines 
its specificity. See PCT patent application No. WO US99/15047, and Lau et al. y 
infra, incorporated herein by reference. The state of the art in DNA synthesis 

25 allows the artisan to construct de novo DNA compounds of size sufficient to 
construct a useful portion of a PKS module or domain. Thus, the desired 
derivative coding sequences can be synthesized using standard solid phase 
synthesis methods such as those described by Jaye et al % 1984, J. Biol Chem. 259: 
6331, and instruments for automated synthesis are available commercially from, 

30 for example, Applied Biosystems, Inc. For purposes of the invention, such 
synthetic DNA compounds are deemed to be a portion of a PKS. 

With this general background regarding hybrid PKSs of the invention, one 
can better appreciate the benefit provided by the DNA compounds of the invention 
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that encode the individual domains, modules, and proteins that comprise the 
megalomicin PKS. As described above, the megalomicin PKS is comprised of a 
loading module, six extender modules composed of a KS, AT, ACP, and zero, 
one, two, or three KR, DH, and ER domains, and a thioesterase domain. The DNA 
5 compounds of the invention that encode these domains individually or in 
combination are useful in the construction of the hybrid PKS encoding DNA 
compounds of the invention. For example, a DNA compound of the invention that 
encodes an extender module or portion of an extender module is useful in the 
construction of a coding sequence that encodes a protein subcomponent of a PKS. 

10 The DNA compound of the invention that comprises a coding sequence of a PKS 
subunit protein is useful in the construction of an expression vector that drives 
expression of the subunit in a host cell that expresses the other subunits and so 
produces a functional PKS. 

The recombinant DNA compounds of the invention that encode the 

1 5 loading module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
. DNA compound comprising a sequence that encodes the megalomicin PKS 
loading module is inserted into a DNA compound that comprises the coding 
sequence for one or more heterologous PKS extender modules. The resulting 

20 construct, in which the coding sequence for the loading module of the 

heterologous PKS is replaced by that for the coding sequence of the megalomicin 
PKS loading module provides a novel PKS. Examples include the DEBS, 
rapamycin, FK-506, FK-520, rifamycin, and avermectin PKS coding sequences. In 
another embodiment, a DNA compound comprising a sequence that encodes the 

25 megalomicin PKS loading module is inserted into a DNA compound that 
comprises the coding sequence for the megalomicin PKS or a recombinant 
megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the loading module coding sequence 
is utilized in conjuction with a heterologous coding sequence. In this embodiment, 

30 the invention provides, for example, replacing the methylmalonyl CoA (propionyl) 
specific AT with a malonyl CoA (acetyl), ethylmalonyl CoA (butyryl), or other 
CoA specific AT. In addition, the AT and/or ACP can be replaced by another AT 
and/or another ACP or an inactivated KS, such as a KS Q , an AT, and/or another 
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ACP. The resulting heterologous loading module coding sequence can be utilized 
in conjunction with a coding sequence for a PKS that synthesizes megalomicin, a 
megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
5 extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS first 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 

10 sequence for a module of the heterologous PKS is either replaced by that for the 
first extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for modules of the heterologous PKS, provides a novel PKS 
coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the megalomicin PKS is 

1 5 inserted into a DNA compound that comprises coding sequences for the 
megalomicin PKS or a recombinant megalomicin PKS that produces a 
megalomicin derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

20 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting (which includes inactivating) the 
KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a 
DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be 

25 replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the megalomicin PKS, from a gene 
for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous first extender module coding sequence can 

30 be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of extender module 1 or insertion of a DH domain or DH and KR domains 
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into extender module 1 will prevent the typical cyclization of the polyketide at the 
hydroxyl group created by the ICR if such hybrid module is employed as a first 
extender module in a hybrid PKS or is otherwise involved in producing a portion 
of the polyketide at which cyclization is to occur. Such deletions or insertions can 
5 be useful, however, to create linear molecules or to induce cyclization at another 
site in the molecule. 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode such PKSs in which the 
KS domain of the first extender module has been inactivated. Such constructs are 

10 typically expressed in translational reading frame with the first two extender 
modules on a single protein, with the remaining modules and domains of a 
megalomicin, megalomicin derivative, or hybrid PKS expressed as one or more, 
typically two, proteins to form the multi-protein functional PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the 

1 5 PKS encoded thereby can be fed or supplied with N-acylcysteamine thioesters of 
precursor molecules to prepare megalomicin derivative compounds. See U.S. 
patent application Serial No. 09/492,733, filed 27 Jan. 2000, and PCT publication 
Nos. WO 00/44717, 99/03986 and 97/02358, each of which is incorporated herein 
by reference. 

20 The recombinant DNA compounds of the invention that encode the second 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
second extender module is inserted into a DNA compound that comprises the 

25 coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that 
for the second extender module of the megalomicin PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that 

30 encodes the second extender module of the megalomicin PKS is inserted into a 
DNA compound that comprises the coding sequences for the megalomicin PKS or 
a recombinant megalomicin PKS that produces a megalomicin derivative. 
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In another embodiment, a portion or all of the second extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
5 CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
replacing the KR with a KR, a KR and a DH, or a KR, DH, and ER; and/or 
inserting a DH or a DH and an ER. In addition, the KS and/or ACP can be 
replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 

10 from a coding sequence for another module of the megalomicin PKS, from a 

coding sequence for a PKS that produces a polyketide other than megalomicin, or 
from chemical synthesis. The resulting heterologous second extender module 
coding sequence can be utilized in conjunction with a coding sequence from a 
PKS that synthesizes megalomicin, a megalomicin derivative, or another 

15 polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS third 

20 extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
third extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 

25 In another embodiment, a DNA compound comprising a sequence that encodes 
the third extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the third extender module 

30 coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or 
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replacing the KR with an active KR, or a KR and DH, or a KR, DH, and ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In 
each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, 
or ACP coding sequence can originate from a coding sequence for another module 
5 of the megalomicin PKS, from a gene for a PKS that produces a polyketide other 
than megalomicin, or from chemical synthesis. The resulting heterologous third 
extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes megalomicin, a megalomicin derivative, or 
another polyketide. 

10 The recombinant DNA compounds of the invention that encode the fourth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fourth 
extender module is inserted into a DNA compound that comprises the coding 
1 5 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fourth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
20 the fourth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the fourth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 
25 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymaIonyl CoA specific AT; deleting or inactivating any one, two, or alt 
three of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, 
DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, 
the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the 
megalomicin PKS (except for the DH and ER domains), from a coding sequence 
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for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous fourth extender module coding sequence can 
be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 
5 The recombinant DNA compounds of the invention that encode the fifth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fifth 
extender module is inserted into a DNA compound that comprises the coding 

10 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fifth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

1 5 the fifth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises the coding sequence for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the fifth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 

20 create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH 
and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced 

25 with another KS and/or ACP. In each of these replacements or insertions, the 

heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous fifth extender module coding 

30 sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the megalomicin PKS and the corresponding polypeptides 

57 




WO 01/27284 PCT/US00/27433 

encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS sixth 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
5 sequence for a module of the heterologous PKS is either replaced by that for the 
sixth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
the sixth extender module of the megalomicin PKS is inserted into a DNA 

1 0 compound that comprises the coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the sixth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 

15 replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting or inactivating the KR or 
replacing the KR with another KR, a KR and DH, or a KR, DH, and an ER; and/or 
inserting a DH or a DH and ER. In addition, the KS and/or ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the 

20 heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 

25 synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The sixth extender module of the megalomicin PKS is followed by a 
thioesterase domain. This domain is important in the cyclization of the polyketide 
and its cleavage from the PKS. The present invention provides recombinant DNA 
compounds that encode hybrid PKS enzymes in which the megalomicin PKS is 

30 fused to a heterologous thioesterase or a heterologous PKS is fused to the 

megalomicin PKS thioesterase. Thus, for example, a thioesterase domain coding 
sequence from another PKS gene can be inserted at the end of the sixth (or other 
final) extender module coding sequence in recombinant DNA compounds of the 
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invention or the megalomicin PKS thioesterase can be similarly fused to a 
heterologous PKS. Recombinant DNA compounds encoding this thioesterase 
domain are useful in constructing DNA compounds that encode the megalomicin 
PKS, a PKS that produces a megalomicin derivative, and a PKS that produces a 
5 polyketide other than megalomicin or a megalomicin derivative. 

Thus, the hybrid modules of the invention are incorporated into a PKS to 
provide a hybrid PKS of the invention. A hybrid PKS of the invention can result 
not only: 

(i) from fusions of heterologous domain (where heterologous means the 
1 0 domains in a module are derived from at least two different naturally occurring 

modules) coding sequences to produce a hybrid module coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous modules (where heterologous module 
1 5 means two modules are adjacent to one another that are not adjacent to one 

another in naturally occurring PKS enzymes) coding sequences to produce a 
hybrid coding sequence contained in a PKS gene whose product is incorporated 
into a PKS, 

(iii) from expression of one or more megalomicin PKS genes with one or 
20 more non-megalomicin PKS genes, including both naturally occurring and 

recombinant non-megalomicin PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

25 An example of a hybrid PKS comprising fused modules results from 

fusion of the loading module of either the DEBS PKS or the narbonolide PKS (see 
PCT patent application No. US99/1 1814, incorporated herein by reference) with 
extender modules 1 and 2 of the megalomicin PKS to produce a hybrid megAI 
gene. Co-expression of either one of these two hybrid megAI genes with the 

30 megAII and megAIII genes in suitable host cells, such as Streptomcyes lividans, 
results in expression of a hybrid PKS of the invention that produces 6- 
deoxyerythronolide B (the polyketide product of the natural megA genes) in 
recombinant host cells. Co-expression of either one of these two hybrid megAI 

59 




WO 01/27284 PCT/US00/27433 

genes with the eryAII and eryAIII genes similarly results in the production of 6- 
dEB, while co-expression with the analogous narbonolide PKS genes, picAII, 
picAIIIand picAIV, results in the production of 3-deoxy-3-oxo-6-dEB (3-keto-6- 
dEB), useful in the production of ketolides, compounds with potent anti-bacterial 
5 activity. 

Another example of a hybrid PKS comprising a hybrid module is prepared 
by co-expressing the megAI and megAII genes with a megAJU hybrid gene 
encoding extender module 5 and the KS and AT of extender module 6 of the 
megalomicin PKS fused to the ACP of module 6 and the TE of the narbonolide 

1 0 PKS. The resulting hybrid PKS of the invention produces 3-keto-6-dEB. This 

compound can also be prepared by a recombinant megalomicin derivative PKS of 
the invention in which the KR domain of module 6 of the megalomicin PKS has 
been deleted. Moreover, the invention provides hybrid PKSs in which not only the 
above changes have been made but also the AT domain of module 6 has been 

1 5 replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desmethyl-3- 
deoxy-3-oxo-6-dEB, a useful intermediate in the preparation of 2-desmethyl 
ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of 
the invention resulting only from the latter change in the hybrid PKS just 

20 described. Thus, co-expression of the megAI and megAII genes with a hybrid 
megAIIl gene in which the AT domain of module 6 has been replaced by a 
malonyl-specific AT results in the expression of a hybrid PKS that produces 2- 
desmethyl-6-dEB in recombinant host cells. This compound is a useful 
intermediate for making 2-desmethyl erythromycins in recombinant host cells of 

25 the invention, as well as for making 2-desmethyl semi-synthetic ketolides. 

While many of the hybrid PKSs described above are composed primarily 
of megalomicin PKS proteins, those of skill in the art recognize that the present 
invention provides many different hybrid PKSs, including those composed of only 
a small portion of the megalomicin PKS. For example, the present invention 

30 provides a hybrid PKS in which a hybrid eryAI gene that encodes the megalomicin 
PKS loading module fused to extender modules 1 and 2 of DEBS is coexpressed 
with the eryAII and eryAIII genes. The resulting hybrid PKS produces 6-dEB, the 
product of the native DEBS. When the construct is expressed in 
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Saccharopolyspora erythraea host cells (either via chromosomal integration in the 
chromosome or via a vector that encodes the hybrid PKS), the resulting 
recombinant host cell of the invention produces erythromycins. Another 
illustrative example is the hybrid PKS of the invention composed of the megAI 
5 and eryAIJ and eryAIII gene products. This construct is also useful in expressing 
erythromycins in Saccharopolyspora erythraea host cells. In a preferred 
embodiment, the S. erythraea host cells are eryAJ mutants that do not produce 6- 
deoxyeryt hronol ide B . 

Another example is the hybrid PKS of the invention composed of the 

10 products of the picAI and picAII genes (the two proteins that comprise the loading 
module and extender modules 1 - 4, inclusive, of the narbonolide PKS) and the 
megAUI gene. The resulting hybrid PKS produces the macrolide aglycone 3- 
hydroxy-narbonolide in Streptomyces lividans host cells and the corresponding 
erythromycins in Saccharopolyspora erythraea host cells. 

1 5 Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 

PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. The oleP gene encodes an oleandomycin 
modification enzyme, and expression of the gene together with a hybrid PKS of 
the invention provides the compounds of the invention in which a C-8 hydroxyl, a 

20 C-8a or C-8-C-8a epoxide is present. 

Recombinant methods for manipulating modular PKS genes to make 
hybrid PKS enzymes are described in U.S. Patent Nos. 5,672,491 ; 5,843,718; 
5,830,750; and 5,712,146; and in PCT publication Nos. 98/49315 and 97/02358, 
each of which is incorporated herein by reference. A number of genetic 

25 engineering strategies have been used with DEBS to demonstrate that the 

structures of polyketides can be manipulated to produce novel natural products, 
primarily analogs of the erythromycins (see the patent publications referenced 
supra and Hutchinson, 1998, Curr Opin Microbiol 7:319-329, and Baltz, 1998, 
Trends Microbiol <J:76-83, incorporated herein by reference). Because of the 

30 similar activity of the megalomicin PKS and DEBS (both PKS enzymes produce 
the macrolide aglycone 6-dEB), these methods can be readily applied to the 
recombinant megalomicin PKS genes of the invention. 
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These techniques include: (i) deletion or insertion of modules to control 
chain length, (ii) inactivation of reduction/dehydration domains to bypass beta- 
carbon processing steps, (iii) substitution of AT domains to alter starter and 
extender units, (iv) addition of reduction/dehydration domains to introduce 
5 catalytic activities, and (v) substitution of ketoreductase KR domains to control 
hydroxyl stereochemistry. In addition, engineered blocked mutants of DEBS have 
been used for precursor directed biosynthesis of analogs that incorporate 
synthetically derived starter units. For example, more than 100 novel polyketides 
were produced by engineering single and combinatorial changes in multiple 

10 modules of DEBS. Hybrid PKS enzymes based on DEBS with up to three catalytic 
domain substitutions were constructed by cassette mutagenesis, in which various 
DEBS domains were replaced with domains from the rapamycin PKS (see 
Schweke et aL, 1995, Proc. Nat Acad. ScL USA 92, 7839-7843, incorporated 
herein by reference) or one more of the DEBS KR domains was deleted. 

1 5 Functional single domain replacements or deletions were combined to generate 
DEBS enzymes with double and triple catalytic domain substitutions (see 
McDaniel etal, 1999, Proc. Nat. Acad ScL USA 96, 1846-1851, incorporated 
herein by reference). By providing the analogous megalomicin/rapamycin hybrid 
PKS enzymes, the present invention provides alternative means to make these 

20 polyketides. 

Methods for generating libraries of polyketides have been greatly improved 
by cloning PKS genes as a set of three or more mutually selectable plasmids, each 
carrying a different wild-type or mutant PKS gene, then introducing all possible 
combinations of the plasmids with wild-type, mutant, and hybrid PKS coding 

25 sequences into the same host (see U.S. patent application Serial No. 60/129,73 1 , 
filed 16 Apr. 1999, and PCT Pub. No. 98/27203, each of which is incorporated 
herein by reference). This method can also incorporate the use of a KS1° mutant, 
which by mutational biosynthesis can produce polyketides made from diketide 
starter units (see Jacobsen et aL, 1997, Science 277, 367-369, incorporated herein 

30 by reference), as well as the use of a truncated gene that leads to 1 2-membered 
macrolides or an elongated gene that leads to 16-membered ketolides. Moreover, 
by utilizing in addition one or more vectors that encode glycosyl biosynthesis and 
transfer genes, such as those of the present invention for megosamine, 
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desosamine, oleandrose, cladinose, and/or mycarose (in any combination), a large 

collection of glycosylated polyketides can be prepared. 

The following Table lists references describing illustrative PKS genes and 

corresponding enzymes that can be utilized in the construction of the recombinant 
5 hybrid PKSs and the corresponding DNA compounds that encode them of the 

invention. Also presented are various references describing tailoring enzymes and 

corresponding genes that can be employed in accordance with the methods of the 

invention. 

Avermectin 
0 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et al. y 1993, Industrial Microorganisms: Basic and Applied 

Molecular Genetics , Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 

Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, 

Erythromycin, and Nemadectin. 
5 MacNeil et ai, 1992, Gene 115: 1 19-125, Complex Organization of the 

Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 

Candicidin (FR008) 

Hu etal., 1994, Afo/. Microbiol 14: 163-172. 

Epothilone 

0 PCT Pub. No. 00/03 1 247 to Kosan. 

Erythromycin 

PCT Pub. No. 93/13663 to Abbott. 
US Pat. No. 5,824,513 to Abbott. 
Donadio et ai 9 1991, Science 252:675-9. 
5 Cortes et ah, 8 Nov. 1 990, Nature 348: 1 76-8, An unusually large 

multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
Glvcosvlation Enzymes 
PCT Pub. No. 97/23630 to Abbott. 
0 FK-506 

Motamedi et al y 1998, The biosynthetic gene cluster for the macrolactone 
ring of the immunosuppressant FK506, Eur. J. biochem. 256: 528-534. 
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Motamedi et al. 7 1997, Structural organization of a multifunctional 
polyketide synthase involved in the biosynthesis of the macrolide 
immunosuppressant FK506, Eur. J. Biochem. 244: 74-80. 

Methvltransferase 

5 US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from 

Streptomyces MA6858. 3l-O-desmethyl-FK506 methyltransferase. 

Motamedi et al. 7 1996, Characterization of methyltransferase and 
hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 
and FK520,7. Bacterioi 178: 5243-5248. 
10 FK-520 

PCT Pub. No. 00/2060 1 to Kosan. 

See also Nielsen et al y 1991, Biochem. 50:5789-96 (enzymology of 
pipecolate incorporation).^; .:<& 
Lovastatin 

1 5 U.S. Pat. No. 5,744,350 to Merck. 

Narbomycin (and Picromycin) 

PCT Pub. No. WO US99/61599 to Kosan. 
Nemadectin 

MacNeil et al , 1 993, supra. 
20 Niddamycin 

Kakavas et al. 7 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, J. Bacterioi 1 79: 75 1 5- 
7522. 

Oleandomycin 

25 Swan et al., 1994, Characterization of a Streptomyces antibioticus gene 

. encoding a type I polyketide synthase which has an unusual coding sequence, Mol. 
Gen. Genet. 242: 358-362. 

PCT Pub. No. 00/026349 to Kosan. 

Olano et al., 1998, Analysis of a Streptomyces antibioticus chromosomal 
30 region involved in oleandomycin biosynthesis, which encodes two 

glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol. 

Gen. Genet. 259(3): 299-308. 

Platenolide 
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EPPub. No. 791,656 to Lilly. 
Rapamycin 

Schwecke et at, Aug. 1 995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Nati Acad Sci. USA 92:7839-7843. 
5 Aparicio et al, 1996, Organization of the biosynthetic gene cluster for 

rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in 
the modular polyketide synthase, Gene 169: 9-16. 
Rifamycin 

August et aL, 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 
10 rifamycin: deductions from the molecular analysis of the /-//biosynthetic gene 
cluster of Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et al. 9 1995, J. Bacteriology 1 77: 3673-3679. A Sorangium 
1 5 cellulosum (Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide 
Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide 
Synthase Genes from Actinomycetes. 
Spiramycin 

U.S. Pat. No. 5,098,837 to Lilly. 
20 Activator Gene 

U.S. Pat. No. 5,514,544 to Lilly. 
Tylosin 

EP Pub. No. 791,655 to Lilly. 

Kuhstoss et al, 1996, Gene 7#J:231-6., Production of a novel polyketide 

25 through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and Cundliffe, 1994, Mol Microbiol 13: 349-355. 
Analysis of five tylosin biosynthetic genes from the tylBA region of the 
30 Streptomyces fradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. 
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In constructing hybrid PKSs of the invention, certain general methods may 
be helpful. For example, it is often beneficial to retain the framework of the 
module to be altered to make the hybrid PKS. Thus, if one desires to add DH and 
ER functionalities to a module, it is often preferred to replace the KR domain of 
5 the original module with a cognate KR, DH, and ER domain-containing segment 
from another module, instead of merely inserting DH and ER domains. One can 
alter the stereochemical specificity of a module by replacement of the KS domain 
with a KS domain from a module that specifies a different stereochemistry. See 
Lau et al, 1999, "Dissecting the role of acyltransferase domains of modular 
10 polyketide synthases in the choice and stereochemical fate of extender units" 

Biochemistry 38(5): 1643- 1651, incorporated herein by reference. One can alter the 
specificity of an AT domain by changing only a small segment of the domain. See 

Lau et al., supra. One can also take advantage of known linker regions in PKS m 

proteins to link modules from two different PKSs to create a hybrid PKS. See 
15 Gokhale et al. y 16 Apr. 1999, Dissecting and Exploiting Intermodular 

Communication in Polyketide Synthases", Science 284: 482-485, incorporated 

herein by reference. 

The hybrid PKS-encoding DNA compounds of the invention can be and 

often are hybrids of more than two PKS genes. Even where only two genes are 
20 used, there are often two or more modules in the hybrid gene in which all or part 

of the module is derived from a second (or third) PKS gene. Thus, as one 

illustrative example, the invention provides a hybrid PKS that contains the 

naturally occurring loading module and thioesterase domain as well as extender 

modules one, two, four, and six of the megalomicin PKS and further contains 
25 hybrid or heterologous extender modules three and five. Hybrid or heterologous 

extender modules three and five contain AT domains specific for malonyl CoA 

and derived from, for example, the rapamycin PKS genes. 

The invention also provides libraries of PKS genes, PKS proteins, and 

ultimately, of polyketides, that are constructed by generating modifications in the 
30 megalomicin PKS so that the protein complexes produced have altered activities 

in one or more respects and thus produce polyketides other than the natural 

product of the PKS. Novel polyketides may thus be prepared, or polyketides in 

general prepared more readily, using this method. By providing a large number of 
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different genes or gene clusters derived from a naturally occurring PKS gene 
cluster, each of which has been modified in a different way from the native cluster, 
an effectively combinatorial library of polyketides can be produced as a result of 
the multiple variations in these activities. As will be further described below, the 
5 metes and bounds of this embodiment of the invention can be described on the 
polyketide, protein, and the encoding nucleotide sequence levels. 

As described above, a modular PKS "derived from" the megalomicin or 
other naturally occurring PKS includes a modular PKS (or its corresponding 
encoding gene(s)) that retains the scaffolding of the utilized portion of the 

1 0 naturally occurring gene. Not all modules need be included in the constructs; 

however, the constructs can also comprise more than six modules. On the constant 
scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted 
so as to alter the activity of the resulting PKS relative to the original (native) PKS. 
Alteration results when these activities are deleted or are replaced by a different 

1 5 version of the activity, or simply mutated in such a way that a polyketide other 
than the natural product results from these collective activities. This occurs 
because there has been a resulting alteration of the starter unit and/or extender 
unit, stereochemistry, chain length or cyclization, and/or reductive or dehydration 
cycle outcome at a corresponding position in the product polyketide. Where a 

20 deleted activity is replaced, the origin of the replacement activity may come from a 
corresponding activity in a different naturally occurring PKS or from a different 
region of the megalomicin PKS. Any or all of the megalomicin PKS genes may be 
included in the derivative or portions of any of these may be included, but the 
scaffolding of a functional PKS protein is retained in whatever derivative is 

25 constructed. The derivative preferably contains a thioesterase activity from the 
megalomicin or another PKS. 

Thus, a PKS derived from the megalomicin PKS includes a PKS that 
contains the scaffolding of all or a portion of the megalomicin PKS. The derived 
PKS also contains at least two extender modules that are functional, preferably 

30 three extender modules, and more preferably four or more extender modules, and 
most preferably six extender modules^ The derived PKS also contains mutations, 
deletions, insertions, or replacements of one or more of the activities of the 
functional modules of the megalomicin PKS so that the nature of the resulting 
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polyketide is altered at both the protein and DN A sequence levels. Particular 
preferred embodiments include those wherein a KS, AT, or ACP domain has been 
deleted or replaced by a version of the activity from a different PKS or from 
another location within the same PKS. Also preferred are derivatives where at 
5 least one non-condensation cycle enzymatic activity (KR, DH, or ER) has been 
deleted or added or wherein any of these activities has been mutated so as to 
change the structure of the polyketide synthesized by the PKS. 

Conversely, also included within the definition of a PKS derived from the 
megalomicin PKS are functional non-megalomicin PKS modules or their 

10 encoding genes wherein at least one domain or coding sequence therefor of a 
megalomicin PKS module has been inserted. Exemplary is the use of the 
megalomicin AT for extender module 2, which accepts a methylmalonyl CoA 
extender unit rather than malonyl CoA, to replace a malonyl specific AT in 
another PKS. Other examples include insertion of portions of non-condensation 

1 5 cycle enzymatic activities or other regions of megalomicin synthase activity into a 
heterologous PKS at both the DNA and protein levels. 

Thus, there are at least five degrees of freedom for constructing a hybrid 
PKS in terms of the polyketide that will be produced. First, the polyketide chain 
length is determined by the number of extender modules in the PKS, and the 

20 present invention includes hybrid PKSs that contain 6, as wells as fewer or more 
than 6, extender modules. Second, the nature of the carbon skeleton of the PKS is 
determined by the specificities of the acyl transferases that determine the nature of 
the extender units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or 
other substituted malonyl. Third, the loading module specificity also has an effect 

25 on the resulting carbon skeleton of the polyketide. The loading module may use a 
different starter unit, such as acetyl, butyryl, and the like. As noted above, another 
method for varying loading module specificity involves inactivating the KS 
activity in extender module 1 (KS1) and providing alternative substrates, called 
diketides, that are chemically synthesized analogs of extender module 1 diketide 

30 products, for extender module 2. This approach was illustrated in PCT publication 
Nos. 97/02358 and 99/03986, incorporated herein by reference, wherein the KS1 
activity was inactivated through mutation. Fourth, the oxidation state at various 
positions of the polyketide will be determined by the dehydratase and reductase 
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portions of the modules. This will determine the presence and location of ketone 
and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. 

Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
5 associated with substituted malonyls as extender units, which affects 

stereochemistry only when the reductive cycle is missing or when it contains only 
a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity 
of the ketoreductase may determine the chirality of any beta-OH. Finally, the 
enoylreductase specificity for substituted malonyls as extender units may influence 

10 the stereochemistry when there is a complete KR/DH/ER available. 

Thus, the modular PKS systems generally and the megalomicin PKS 
system particularly permit a wide range of polyketides to be synthesized. As 
^compared to the aromatic PKS systems, the modular PKS systems accept a widens 
range of starter units, including aliphatic monomers (acetyl, propionyl, butyryl, 

15 , isovaleryl, and the like.), aromatics (aminohydroxybenzoyl), alicyclics 

(cyclohexanoyl), and heterocyclics (thiazolyl). Certain modular PKSs have relaxed 
specificity for their starter units (Kao et ai, 1994, Science, supra). Modular PKSs 
also exhibit considerable variety with regard to the choice of extender units in 
each condensation cycle. The degree of beta-ketoreduction following a 

20 condensation reaction can be altered by genetic manipulation (Donadio et aL, 

1991, Science, supra; Donadio el ai, 1993, Proa Natl Acad. Set USA 90: 71 19- 
7123). Likewise, the size of the polyketide product can be varied by designing 
mutants with the appropriate number of modules (Kao et al , 1 994, J. Am, Chem. 
Soc. 116:1 1612-1 1613). Lastly, modular PKS enzymes are particularly well 

25 known for generating an impressive range of asymmetric centers in their products 
in a highly controlled manner. The polyketides, antibiotics, and other compounds 
produced by the methods of the invention are typically single stereoisomeric 
forms. Although the compounds of the invention can occur as mixtures of 
stereoisomers, it may be beneficial in some instances to generate individual 

30 stereoisomers. Thus, the combinatorial potential within modular PKS pathways 
based on any naturally occurring modular, such as the megalomicin, PKS scaffold 
is virtually unlimited. 
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While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates 
5 for mutation can be an entire cluster of genes or only one or two of them; the 
substrate for mutation may also be portions of one or more of these genes. 
Techniques for mutation include preparing synthetic oligonucleotides including 
the mutations and inserting the mutated sequence into the gene encoding a PKS 
subunit using restriction endonuclease digestion. See, e.g., Kunkel, 1985, Proc. 

10 Natl Acad. Sci. USA 82: 448; Geisselsoder et al, 1987, BioTechniques 5:786. 
Alternatively, the mutations can be effected using a mismatched primer (generally 
10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a 
*\- temperature below the melting temperature of the mismatched duplex. The primer 
can be made specific by keeping primer length and base composition within 

1 5 relatively narrow limits and by keeping the mutant base centrally located. See 
Roller and Smith, 1983, Methods Enzymol 700:468. Primer extension is effected 
using DNA polymerase, the product cloned, and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. 
Identification can be accomplished using the mutant primer as a hybridization 

20 probe. The technique is also applicable for generating multiple point mutations. 
See, e.g., Dalbie-McFarland et aL, 1982, Proc. Natl. Acad. Sci USA 79: 6409. 
PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 
encoding enzymatic activities can also be accomplished by several different 

25 techniques known in the art, e.g., by inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating 
incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR 
mutagenesis, by preparing synthetic mutants, or by damaging plasmid DNA in 
vitro with chemicals, in accordance with the methods of the present invention. 

30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine 
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intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
Generally, plasmid DNA or DNA fragments are treated with chemical mutagens, 
transformed into E> coli and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS 
synthases or from different locations in the same PKS, can be recovered, for 
example, using PCR techniques with appropriate primers. By "corresponding" 
activity encoding regions is meant those regions encoding the same general type of 
activity. For example, a KR activity encoded at one location of a gene cluster 

10 "corresponds" to a KR encoding activity in another location in the gene cluster or 
in a different gene cluster. Similarly, a complete reductase cycle could be 
considered corresponding. For example, KR/DH/ER can correspond to a KR 
alone. .m 

If replacement of a particular target region in a host PKS is to be made, 

15 this replacement can be conducted in vitro using suitable restriction enzymes. The 
replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCT 

20 publication No. WO 96/40968, incorporated herein by reference. The vectors used 
to perform the various operations to replace the enzymatic activity in the host PKS 
genes or to support mutations in these regions of the host PKS genes can be 
chosen to contain control sequences operably linked to the resulting coding 
sequences in a manner such that expression of the coding sequences can be 

25 effected in an appropriate host. 

However, simple cloning vectors may be used as well. If the cloning 
vectors employed to obtain PKS genes encoding derived PKS lack control 
sequences for expression operably linked to the encoding nucleotide sequences, 
the nucleotide sequences are inserted into appropriate expression vectors. This 

30 need not be done individually, but a pool of isolated encoding nucleotide 
sequences can be inserted into expression vectors, the resulting vectors 
transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. The invention provides a variety of recombinant DNA 
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compounds in which the various coding sequences for the domains and modules 
of the megalomicin PKS are flanked by non-naturally occurring restriction enzyme 
recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
5 recombinant vectors as individual cassettes, with separate control elements, or 
under the control of, e.g., a single promoter. The PKS subunit encoding regions 
can include flanking restriction sites to allow for the easy deletion and insertion of 
other PKS subunit encoding sequences so that hybrid PKSs can be generated. The 
design of such unique restriction sites is known to those of skill in the art and can 

10 be accomplished using the techniques described above, such as site-directed 
mutagenesis and PCR. 

The expression vectors containing nucleotide sequences encoding a variety 
of PKS enzymes for the production of different polyketides are then transformed 
into the appropriate host cells to construct the library. In one straightforward 

1 5 approach, a mixture of such vectors is transformed into the selected host cells and 
the resulting cells plated into individual colonies and selected to identify 
successful transformants. Each individual colony has the ability to produce a 
particular PKS synthase and ultimately a particular polyketide. Typically, there 
will .be duplications in some, most, or all of the colonies; the subset of the 

20 transformed colonies that contains a different PKS in each member colony can be 
considered the library. Alternatively, the expression vectors can be used 
individually to transform hosts, which transformed hosts are then assembled into a 
library. A variety of strategies are available to obtain a multiplicity of colonies 
each containing a PKS gene cluster derived from the naturally occurring host gene 

25 cluster so that each colony in the library produces a different PKS and ultimately a 
different polyketide. The number of different polyketides that are produced by the 
library is typically at least four, more typically at least ten, and preferably at least 
20, and more preferably at least 50, reflecting similar numbers of different altered 
PKS gene clusters and PKS gene products. The number of members in the library 

30 is arbitrarily chosen; however, the degrees of freedom outlined above with respect 
to the variation of starter, extender units, stereochemistry, oxidation state, and 
chain length enables the production of quite large libraries. 
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Methods for introducing the recombinant vectors of the invention into 
suitable hosts are known to those of skill in the art and typically include the use of 
CaCh or agents such as other divalent cations, lipofection, DMSO, protoplast 
transformation, conjugation, infection, transfection, and electroporation. The 
5 polyketide producing colonies can be identified and isolated using known 

techniques and the produced polyketides further characterized. The polyketides 
produced by these colonies can be used collectively in a panel to represent a 
library or may be assessed individually for activity. 

The libraries of the invention can thus be considered at four levels: (1) a 

10 multiplicity of colonies each with a different PKS encoding sequence; (2) the 
proteins produced from the coding sequences; (3) the polyketides produced from 
the proteins assembled into a functional PKS; and (4) antibiotics or compounds 
with other desired activities derived from the polyketides. Of course, combination 
libraries can also be constructed wherein members of a library derived, for 

1 5 example, from the megalomicin PKS can be considered as a part of the same 
library as those derived from, for example, the rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and 
thus to produce the relevant polyketides to obtain a library of polyketides. The 
polyketides secreted into the media can be screened for binding to desired targets, 

20 such as receptors, signaling proteins, and the like. The supematants per se can be 
used for screening, or partial or complete purification of the polyketides can first 
be effected. Typically, such screening methods involve detecting the binding of 
each member of the library to receptor or other target ligand. Binding can be 
detected either directly or through a competition assay. Means to screen such 

25 libraries for binding are well known in the art and can be applied in accordance 
with the methods of the present invention. Alternatively, individual polyketide 
members of the library can be tested against a desired target. In this event, screens 
wherein the biological response of the target is measured can more readily be 
included. Antibiotic activity can be verified using typical screening assays such as 

30 those set forth in Lehrer et al t 1 99 1 , J. Immunol. Metk 757:167-173, incorporated 
herein by reference, and in the Examples below. 

The invention provides methods for the preparation of a large number of 
polyketides. These polyketides are useful intermediates in formation of 
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compounds with antibiotic or other activity through hydroxylation, epoxidation, 
and glycosylation reactions as described above. In general, the polyketide products 
of the PKS must be further modified, typically by hydroxylation and glycosylation, 
to exhibit potent antibiotic activity. Hydroxylation results in the novel polyketides 
5 of the invention that contain hydroxyl groups at C-6, which can be accomplished 
using the hydroxylase encoded by the eryF gene, and/or C-12, which can be 
accomplished using the hydroxylase encoded by the picK or eryK gene. Also, the 
oleP gene is available in recombinant form, which can be used to express the oleP 
gene product in any host cell. A host cell, such as a Streptomyces host cell or a 

10 Saccharopolyspora erythraea host cell, modified to express the oleP gene thus can 
be used to produce polyketides comprising the C-8-C-8a epoxide present in 
oleandomycin. Thus the invention provides such modified polyketides. The 
presence of hydroxyl groups at these positions can enhance the antibiotic activity 
of the resulting compound relative to its unhydroxylated counterpart. 

1 5 Methods for glycosylating polyketides are generally known in the art and 

can be applied in accordance with the methods of the present invention; the 
glycosylation may be effected intracellular^ by providing the appropriate 
glycosylation enzymes or may be effected in vitro using chemical synthetic means 
as described herein and in PCT publication No. WO 98/49315, incorporated 

20 herein by reference. Preferably, glycosylation with desosamine, mycarose, and/or 
megosamine is effected in accordance with the methods of the invention in 
recombinant host cells provided by the invention. In general, the approaches to 
effecting glycosylation mirror those described above with respect to 
hydroxylation. The purified enzymes, isolated from native sources or 

25 recombinantly produced may be used in vitro. Alternatively and as noted, 

glycosylation may be effected intracellularly using endogenous or recombinantly 
produced intracellular glycosylases. In addition, synthetic chemical methods may 
be employed. 

The antibiotic modular polyketides may contain any of a number of 
30 different sugars, although D-desosamine, or a close analog thereof, is most 

common. Erythromycin, picromycin, megalomicin, narbomycin, and methymycin 
contain desosamine. Erythromycin also contains L-cladinose (3-O-methyl 
mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 
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6-deoxy-D-allose. 2-acetyl-l-bromodesosamine has been used as a donor to 
glycosylate polyketides by Masamune et al., 1975, J. Am. Chem. Soc. 97: 3512- 
3513. Other, apparently more stable donors include glycosyl fluorides, 
thioglycosides, and trichloroacetimidates; see Woodward et al, 1981, J. Am. 
5 Chem. Soc. 103: 3215; Martin et al., 1997, J. Am. Chem. Soc. 119: 3193; Toshima 
etal, 1995, J. Am. Chem. Soc. 117: 3717; Matsumotoe/a/., 1988, Tetrahedron 
Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones 
as starting materials and using Saccharopolyspora erythraea or Streptomyces 
venezuelae or other host cell to make the conversion, preferably using mutants 
10 unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 
enzymes of the invention. These polyketides are useful as antibiotics and as 
intermediates in the synthesis of other useful compounds, as described in the j;; 
following section. 

15 

Section VII: Host Cells Containing Multiple Expression Vectors 

A recombinant host cell of the invention may contain nucleic acid 
encoding a megalomicin PKS domain, module, or protein, or megalomicin 
modification enzyme at a single genetic locus, e.g., on a single plasmid or at a 

20 single chromosomal locus, or at different genetic loci, e.g., on separate plasmids 
and/or chromosomal loci. By "multiple" is meant two or more; by "vector" is 
meant a nucleic acid molecule which can be used to transform host systems and 
which contains an independent expression system containing a coding sequence 
under control of a promoter and optionally a selectable marker and any other 

25 suitable sequences regulating expression. Typical such vectors are plasmids, but 
other vectors such as phagemids, cosmids, viral vectors and the like can be used 
according to the nature of the host. Of course, one or more of the separate vectors 
may integrate into the chromosome of the host (selection may not be required for 
maintenance of integrated vectors). 

30 In one embodiment, the invention provides a recombinant host cell, which 

comprises at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PKS domain or a megalomicin modification enzyme 
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operably linked to a promoter. In another embodiment, the invention provides a 
, recombinant host cell, which comprises at least one autonomously replicating 
recombinant DNA expression vector and at least one modified chromosome, each 
of said vector(s) and each of said modified chromosome comprises a recombinant 
5 DNA compound encoding a megalomicin PKS domain or a megalomicin 

modification enzyme operably linked to a promoter. Preferably, the autonomously 
replicating recombinant DNA expression vector and/or the modified chromosome 
further comprises distinct selectable markers. 

The above multiple- vector (chromosome) expression systems can also be 

10 used for expressing heterogeneous polyketide biosynthetic enzymes, e.g. , for 

expressing Micromonospora megalomicea megalomicin PKS protein, module, or 

domain or a megalomicin modification enzyme with a PKS protein, module, or 

domain, or modification enzyme from other origins in the same host cells. By -n 

placing various activities on different expression vectors, a high degree of 

1 5 variation can be achieved in an efficient manner. A variety of hosts can be used; 
any suitable host cell that can maintain multiple vectors can readily be used. 
Preferred hosts include Streptomyces, yeast, E. coli, other actinomycetes, and plant 
cells, and mammalian or insect cells or other suitable recombinant hosts can also 
be used. Preferred among yeast strains are Saccharomyces cerevisiae and Pichia 

20 pastoris. Preferred actinomycetes include various strains of Streptomyces. 

If one chooses to use a host cell that does not naturally produce a 
polyketide, then one may need to ensure that the recombinant host is modified to 
also contain a holo ACP synthase activity that effects pantetheinylation of the acyl 
carrier protein. See PCT Pub. No. WO 97/13845, incorporated herein by 

25 reference. One of the multiple vectors may be used for this purpose. This 

activation step is necessary for activation of the ACP. The expression system for 
the holo ACP synthase may be supplied on a vector separate from that carrying a 
PKS coding sequence or may be supplied on the same vector or may be integrated 
into the chromosome of the host, or may be supplied as an expression system for a 

30 fusion protein with all or a portion of a polyketide synthase (see U.S. Patent No. 
6,033,883, incorporated herein by reference). 

It should be noted that in some recombinant hosts, it may also be necessary 
to activate the polyketides produced through postsynthesis modifications when 
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polyketides having such modifications are desired. If this is the case for a 
particular host, the host will be modified, for example by transformation, to 
contain those enzymes necessary for effecting these modifications. Among such 
enzymes, for example, are glycosylation enzymes. The use of multiple vectors can 
5 facilitate the introduction of expression systems for such enzymes. 

In a preferred embodiment, the multiple vector system is used to assemble 
rapidly and efficiently a combinatorial library of polyketides and the 
PKS/modification enzymes that produce them. In an illustrative embodiment, the 
multiple vector system comprises four different vectors, one comprising the megAI 

1 0 gene, one the megAII gene, one the megAIII gene, and one the modification 

enzyme(s) gene(s). Each of these vectors can be modified to make a set of vectors. 
For example, one set could contain all possible AT substitutions in the loading and 
first and second extender modules of the megAI gene product. Another set could 
contain expression systems for a variety of different modification enzymes. With 

1 5 these four vectors sets and by combining each member of each set with each 
member of the other three sets, a very large library of cells, vector sets, and 
polyketides can be rapidly and efficiently assembled. 

The combinatorial potential of a modular PKS such as the megalomicin 
PKS (ignoring the additional potential of different modification enzyme systems) 

20 is minimally given by: ATl X (ATe X 4)m where AT L is the number of loading 
acyl transferases, ATe is the number of extender acyl transferases, and M is the 
number of modules in the gene cluster. The number 4 is present in the formula 
because this represents the number of ways a keto group can be modified by either 
1) no reaction; 2) KR activity alone; 3) KR+DH activity; or 4) KR+DH+ER 

25 activity. It has been shown that expression of only the first two modules of the 
erythromycin PKS resulted in the production of a predicted truncated triketide 
product (See Kaoet al.,7. ^m. Chem. Sac., 116:1 1612-11613 ((1994)). Anovel 
12-membered macrolide similar to methymycin aglycone was produced by 
expression of modules 1-5 of this PKS in S. coelicolor (See Kao et al., J. Am. 

30 Chem. Soc. , H7:91 05-9 1 06 ( 1 995)). This work shows that PKS modules are 

functionally independent so that lactone ring size can be controlled by the number 
of modules present. 
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In addition to controlling the number of modules, the modules can be 
genetically modified, for example, by the deletion of a ketoreductase domain as 
described by Donadio et al., Science, 252:675-679 (1991); and Donadio et al., 
Gene, 115:97-103 (1992). In addition, the mutation of an enoyl reductase domain 
5 was reported by Donadio, et al., Proc. Natl. Acad ScL, 90:71 19-7123 (1993). 
These modifications also resulted in modified PKS and thus modified polyketides. 

As stated above, in the present invention, the coding sequences for 
catalytic activities derived from the megalomicin PKS systems found in nature can 
be used in their native forms or modified by standard mutagenesis techniques to 

1 0 delete or diminish activity or to introduce an activity into a module in which it was 
not originally present. For example, a KR activity can be introduced into a 
module normally lacking that function. 

In one embodiment of the invention herein, a single host cell is modified to 
contain a multiplicity of vectors, each vector contributing a portion of the 

1 5 synthesis of a megalomicin PKS and modification enzyme (if any) system. Each 
of the multiple vectors for production of the megalomicin PKS system typically 
encodes at least two modules, and at least one of the vectors integrates into the 
chromosome of the host. Integration can be effected using suitable phage or 
integrating vectors or by homologous recombination. If homologous 

20 recombination is used, the integration event may also be designed to delete 
endogenous PKS genes residing in the chromosome, as described in the PCT 
application WO 95/08548. In these embodiments, too, a selectable marker such as 
hygromycin or thiostrepton resistance can be included in the vector that effects 
integration. 

25 As mentioned above, additional enzymes that effect post-translational 

modifications to the enzyme systems in the megalomicin PKS may be introduced 
into the host through suitable recombinant expression systems. In addition, 
enzymes that activate the polyketides themselves, for example, through 
glycosylation may be added. It may also be desirable to modify the cell to produce 

30 more of a particular substrate utilized in polyketide biosynthesis. For example, it 
is generally believed that malonyl CoA levels in yeast are higher than 
methylmalonyl CoA; if yeast is chosen as a host, it may be desirable to increase 
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methylmalonyl CoA levels by the addition of one or more biosynthetic enzymes 
therefor. 

The multiple-vector expression system can also be used to make 
polyketides produced by the addition of synthetic starter units to a PKS that 
5 contains an inactivated ketosynthase (KS) in the first module. As noted above, 
this modification permits the system to incorporate a suitable diketide thioester 
such as 3-hydroxy-2-methyl pantonoic acid-N-acetyl cysteamine thioester, or 
similar thioesters of diketide analogs, as described by Jacobsen et al., Science, 
277:367-369 (1 997). The construction of PKS modules containing inactivated 

10 ketosynthase regions can be conducted by methods known in the art, such as the 
method described in U.S. Patent No. 6,080,555 and PCT publication Nos. WO 
99/03986 and 97/02358, each of which is incorporated herein by reference, in 
accordance with the methods of the present invention. 

The multiple-vector expression system can be used to produce polyketides 

1 5 in hosts that normally do not produce them, such as E. coli and yeast. It also 
provides more efficient means to provide a variety of polyketide products by 
supplying the elements of the introduced PKS, whether in an £. coli or yeast host 
or in other more traditionally used hosts, such as Streptomyces. The invention 
also includes libraries of polyketides prepared using the methods of the invention. 

20 

Section VIII: Compounds 

The methods and recombinant DNA compounds of the invention are useful 
in the production of polyketides. In one important aspect, the invention provides 
methods for making antibiotic compounds related in structure to erythromycin, a 

25 potent antibiotic compound. The invention also provides novel ketolide 

compounds, polyketide compounds with potent antibiotic activity of significant 
interest due to activity against antibiotic resistant strains of bacteria. See 
Griesgraber et al., 1996, J. Antibiot 49: 465-477, incorporated herein by 
reference. Most if not all of the ketolides prepared to date are synthesized using 

30 erythromycin A, a derivative of 6-dEB, as an intermediate. In one embodiment, 
the present invention provides the 3-keto derivatives of the megalomicins for use 
as antibiotics. In particular, the 3-keto derivative of megalomicin A is a preferred 
ketolide of the invention. These compounds can be made chemically, substantially 
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in accordance with the procedures for making ketolides described in the prior art, 
or in recombinant host cells of the invention in which the megosamine and 
desosamine biosynthetic and transferase genes are present but which do not make 
or transfer the mycarose moiety and/or the PKS has been modified to delete the 
5 KR domain of extender module 6. The invention also provides methods for 
making intermediates useful in preparing traditional, 6-dEB- and erythromycin- 
derived ketolide compounds. See'Griesgraber et al y supra\ Agouridas et aL, 1998, 
J. Med Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 
5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,1 18; 5,543,400; 

1 0 5,527,780; 5,444,05 1 ; 5,439,890; 5,439,889; and PCT publication Nos. WO 
98/09978 and 98/28316, each of which is incorporated herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in 
a host cell that contains the desosamine, megosamine, and/or mycarose 
biosynthetic genes and corresponding transferase genes as well as the required 

1 5 hydroxylase gene(s), which may, for example and without limitation, be either 
picK, megK 9 or eryK (for the CM 2 position) and/or megF overyF (for the C-6 
position). The resulting compounds have antibiotic activity but can be further 
modified, as described in the patent publications referenced above, to yield a 
desired compound with improved or otherwise desired properties. Alternatively, 

20 the aglycone compounds can be produced in the recombinant host cell, and the 
desired glycosylation and hydroxylation steps carried out in vitro or in v/w, in the 
latter case by supplying the converting cell with the aglycone, as described above. 

The compounds of the invention are thus optionally glycosylated forms of 
the polyketide set forth in formula (1) below which are hydroxylated at either the 

25 C-6 or the C-l 2 or both. The compounds of formula (1) can be prepared using the 
loading and the six extender modules of a modular PKS, modified or prepared in 
hybrid form as herein described. These polyketides have the formula: 
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including the glycosylated and isolated stereoisomers forms thereof; 

wherein R* is a straight chain, branched or cyclic, saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1 -1 5C; 
5 each of R -R is independently H or alkyl (1-4C) wherein any alkyl at R 

may optionally be substituted; 

each of X J -X 5 is independently two H, H and OH, or =0; or 

each of X J -X 5 is independently H and the compound of formula (2) 
contains a double-bond in the ring adjacent to the position of said X at 2-3, 4-5, 6- 
10 7, 8-9 and/or 10-11; 

with the proviso that: 

at least two of R'-R 6 are alkyl (1-4C). 

Preferred compounds comprising formula 2 are those wherein at least three 
ofR'-R 5 are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at 

1 5 least four of R*-R 5 are alkyl (1 -4C), preferably methyl or ethyl. Also preferred are 
those wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH 
and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* 
when R ! -R 5 is methyl, X 2 is =0, and X 1 , X 4 and X 5 are OH. The glycosylated 
forms (i.e., mycarose or cladinose at C-3, desosamine at C-5, and/or megosamine 

20 at C-6) of the foregoing are also preferred. 

As described above, there are a wide variety of diverse organisms that can 
modify compounds such as those described herein to provide compounds with or 
that can be readily modified to have useful activities. For example, 
Saccharopolyspora erythraea can convert 6-dEB to a variety of useful 
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compounds. The compounds provided by the present invention can be provided to 
cultures of Saccharopolyspora erythraea and converted to the corresponding 
derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the Examples, below. To ensure that only the desired compound is 
5 produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still cany out the desired conversions (Weber et al. 9 1985, J. 
Bacteriol. 164{\): 425-433). Also, one can employ other mutant strains, such as 
eryB y eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 

10 carried out in large fermentors for commercial production. Each of the 

erythromycins A, B, C, and D has antibiotic activity, although erythromycin A has 
the highest antibiotic activity. Moreover, each of these compounds can form, 
under treatment with mild acid, a C-6 to C-9 hemiketal with motilide activity. For 
formation of hemiketals with motilide activity, erythromycins B, C, and D, are 

15 preferred, as the presence of a C-12 hydroxyl allows the formation of an inactive 
compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of 
the enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 

20 erythraea. Such compounds are useful as antibiotics or as motilides directly or 
after chemical modification. For use as antibiotics, the compounds of the 
invention can be used directly without further chemical modification. 
Erythromycins A, B, C, and D all have antibiotic activity, and the corresponding 
compounds of the invention that result from the compounds being modified by 

25 Saccharopolyspora erythraea also have antibiotic activity. These compounds can 
be chemically modified, however, to provide other compounds of the invention 
with potent antibiotic activity. For example, alkylation of erythromycin at the C-6 
hydroxyl can be used to produce potent antibiotics (clarithromycin is C-6-0- 
methyl), and other useful modifications are described in, for example, Griesgraber 

30 et aL> 1996, 1 AntibioL 49: 465-477, Agouridas et aL, 1998, J. Med Chem. 41: 
4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 
5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 
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5,444,051; 5,439,890; and 5,439,889; and PCT publication Nos. WO 98/09978 
and 98/28316, each of which is incorporated herein by reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 
esophageal peristalsis and LES pressure in patients with GERD, to accelerate 
gastric emptying in patients with gastric paresis, and to stimulate gall bladder 
contractions in patients after gallstone removal and in diabetics with autonomic 
10 neuropathy. See Peeters, 1999, Motilide Web Site, http://www.med.kuleuven. 
ac.be/med/gih/motilid.htm, and Omura et ai, 1987, Macrolides with 
gastrointestinal motor stimulating activity, J. Med Chem. 30: 1941-3). The 
corresponding compounds of the invention that result from the compounds of the 
invention being .modified by Saccharopolyspora erythraea also have motilide 

1 5 activity, particularly after conversion, which can also occur in vivo, to the C-6 to 
C-9 hemiketal by treatment with mild acid. Compounds lacking the C-12 hydroxyl 
are especially preferred for use as motilin agonists. These compounds can also be 
further chemically modified, however, to provide other compounds of the 
invention with potent motilide activity. 

20 Moreover, and also as noted above, there are other useful organisms that 

can be employed to hydroxylate and/or glycosylate the compounds of the 
invention. As described above, the organisms can be mutants unable to produce 
the polyketide normally produced in that organism, the fermentation can be carried 
out on plates or in large fermentors, and the compounds produced can be 

25 chemically altered after fermentation. In addition to Saccharopolyspora erythraea, 
Streptomyces venezuelae, S. narbonensis, S. antibioticus, Micromonospora 
megalomicea, S. fradiae, and S. thermotolerans can also be used. In addition to 
antibiotic activity, compounds of the invention produced by treatment with M. 
megalomicea enzymes can have antiparasitic activity as well. Thus, the present 

30 invention provides the compounds produced by hydroxylation and glycosylation 
by action of the enzymes endogenous to S. erythraea, S. venezuelae, S. 
narbonensis, S. antibioticus, M. megalomicea, S.fradiae, and S. thermotolerans. 



83 



WO 01/27284 PCT7US00/27433 



The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAI, megAIl, and megAIIl genes with one or more 
5 deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene, can be included on expression 
vectors suitable for expression of the encoded gene products in 
Saccharopolyspora erythraea, Micromonospora megalomicea, S. venezuelae, S. 
narbonensis, S. antibioticus, S.fradiae, and S. thermotolerans. 

1 0 The compounds of the invention can be produced by growing and 

fermenting the host cells of the invention under conditions known in the art for the 
production of other polyketides. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard 
procedures. The compounds can be readily formulated to provide the 

15 pharmaceutical compositions of the invention. The pharmaceutical compositions 
of the invention can be used in the form of a pharmaceutical preparation, for 
example, in solid, semisolid, or liquid form. This preparation will contain one or 
more of the compounds of the invention as an active ingredient in admixture with 
an organic or inorganic carrier or excipient suitable for external, enteral, or 

20 parenteral application. The active ingredient may be compounded, for example, 
with the usual non-toxic, pharmaceutically acceptable carriers for tablets, pellets, 
capsules, suppositories, solutions, emulsions, suspensions, and any other form 
suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
25 gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
colloidal silica, potato starch, urea, and other carriers suitable for use in 
manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. 
For example, the compounds of the invention may be utilized with hydroxypropyl 
30 methylcellulose essentially as described in U.S. Patent No. 4,916,138, 

incorporated herein by reference, or with a surfactant essentially as described in 
EPO patent publication No. 428,169, incorporated herein by reference. 
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Oral dosage forms may be prepared essentially as described by Hondo et 
al, 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein 
by reference. Dosage forms for external application may be prepared essentially as 
described in EPO patent publication No. 423,714, incorporated herein by 
5 reference. The active compound is included in the pharmaceutical composition in 
an amount sufficient to produce the desired effect upon the disease process or 
condition. 

For the treatment of conditions and diseases caused by infection, a 
compound of the invention may be administered orally, topically, parenterally, by 

1 0 inhalation spray, or rectally in dosage unit formulations containing conventional 
non-toxic pharmaceutical^ acceptable carriers, adjuvant, and vehicles. The term 
parenteral, as used herein, includes subcutaneous injections, and intravenous, 
j;: intramuscular, and intrastemal injection or infusion techniques. \>y 
Dosage levels of the compounds of the invention are of the order from 

1 5 about 0.0 1 mg to about 50 mg per kilogram of body weight per day, preferably 
from about 0. 1 mg to about 10 mg per kilogram of body weight per day. The 
dosage levels are useful in the treatment of the above-indicated conditions (from 
about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In 
addition, the compounds of the invention may be administered on an intermittent 

20 basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host 
treated and the particular mode of administration. For example, a formulation 
intended for oral administration to humans may contain from 0.5 mg to 5 gm of 

25 active agent compounded with an appropriate and convenient amount of carrier 
material, which may vary from about 5 percent to about 95 percent of the total 
composition. Dosage unit forms will generally contain from about 0.5 mg to about 
500 mg of active ingredient. For external administration, the compounds of the 
invention may be formulated within the range of, for example, 0.00001% to 60% 

30 by weight, preferably from 0.001% to 1 0% by weight, and most preferably from 
about 0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any 
particular patient will depend on a variety of factors. These factors include the 
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activity of the specific compound employed; the age, body weight, general health, 
sex, and diet of the subject; the time and route of administration and the rate of 
excretion of the drug; whether a drug combination is employed in the treatment; 
and the severity of the particular disease or condition for which therapy is sought. 
5 A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 
not be construed as being a limitation on the scope of the invention or claims. 

Example 1 

10 Cloning and Characterization of the Meealomicin Biosvnthetic Gene Cluster from 

Micromonosvora mezlomicea 
Experimental Procedures 
Bacterial Strains, Media, and Growth Conditions 

Routine DNA manipulations were performed in Escherichia coli XL1 Blue 

15 or E. coli XL1 Blue MR (Stratagene) using standard culture conditions (Sambrook 
et al., 1989). M megalomicea subs, nigra NRRL3275 was obtained from the 
ATCC collection and cultured according to recommended protocols. For isolation 
of genomic DNA, M. megalomicea was grown in TSB (Hopwood et al, 1985) at 
30 °C. S. lividans K4-1 14 (Ziermann and Betlach, 1999), which carries a deletion 

20 of the actinorhodin biosynthetic gene cluster, was used as the host for expression 
of the megAl-AIll genes. S. lividans strains were maintained on R5 agar at 30°C 
and grown in liquid YEME for preparation of protoplasts (Hopwood et al., 1985) . 
S. erythraea NRRL2338 was used for expression of the megosamine genes. S. 
erythraea strains were maintained on R5 agar at 34°C and grown in liquid TSB for 

25 preparation of protoplasts. 

Manipulation of DNA and Organisms 

Manipulation and transformation of DNA in E. coli was performed by 
standard procedures (Sambrook et al., 1989) or by suppliers protocols. Protoplasts 
30 of S. lividans and S. erythraea were generated for transformation by plasmid DNA 
using the standard procedure. S. lividans transformants were selected on R5 using 
2 ml of a 0.5 mg/ml thiostrepton overlay. & erythraea transformants were selected 
on R5 using 1.5 ml of a 0.6 mg/ml aprarnycin overlay. 
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Isolation of the meg gene cluster 

A cosmid library was prepared in SuperCos (Stratagene) from M. 
megalomicea total DNA partially digested with Sau3A 1, and introduced into E. 
5 coli using a Gigapack HI XL (Stratagene) in-vitro packaging kit. 32 P-labelled DNA 
probes encompassing the KS2 domain from ery DEBS, or a mixture of segments 
encompassing modules 1 and 2 from ery DEBS were used separately to screen the 
cosmid library by colony hybridization. Several colonies which hybridized with 
the probes were further analyzed by sequencing the ends of their cosmid inserts 

1 0 using T3 and T7 primers. BLAST (Altschul et ai, 1 990) analysis of the sequences 
revealed several colonies with DNA sequences highly homologous to genes from 
the ery cluster. Together with restriction analysis, this led to the isolation of two 
overlapping cosmids, pKOS079-93A and pKOS079-93D which covered -45 kb of 
the meg cluster. A 400 bp PCR fragment was generated from the left end of and 

1 5 pKOS079-93D and used to reprobe the cosmid library. Likewise, a 200 bp PCR 
fragment generated from the right end of pKOS079-93A was used to reprobe the 
cosmid library. Analysis of hybridizing colonies as described above resulted in 
identification of two additional cosmids, pKOS079-138B and pKOS79-124B 
which overlap the previous two cosmids. BLAST analysis of the far left and right 

20 end sequences of these cosmids indicated no homology to any known genes 
related to polyketide biosynthesis and therefore indicates that the set of four 
cosmids spans the entire megalomicin biosynthetic gene cluster. 

DNA sequencing and analysis 

25 PCR-based double stranded DNA sequencing was performed on a 

Beckman CEQ 2000 capillary sequencer using reagents and protocols provided by 
the manufacturer. A shotgun library of the entire cosmid pKOS079-93D insert was 
made as follows: DNA was first digested with Dra I to eliminate the vector 
fragment, then partially digested with Sau3A I. After agarose electrophoresis, 

30 bands between 1 -3 kb were excised from the gel and ligated with BamH I digested 
pUC19. Another shotgun library was generated from a 12 kb Xho VEcoR I 
fragment subcloned from cosmid pKOS079-93 A to extend the sequence to the 
megFgene. A 4 kb Bgl II/ Xho I fragment from cosmid pKOS079-138B was 
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sequenced by primer walking to extend the sequencing to the meg? gene. 
Sequence was assembled using Sequencher (Gene Codes Corp.) software package 
and analyzed with Mac Vector (Oxford Molecular Group) and the NCBI BLAST 
server (www.ncbi.nlm.nih.gov/BLAST/). 

5 

Plasmids 

Plasmid pKOS 108-6 is a modified version of pKA0127'kan* (Ziermann 
and Betlach, 1999; Ziermann and Betlach, 2000) in which the eryAI-III genes 
between the Pac I and EcoR I sites have been replaced with the megAMll genes. 
10 This was done by first substituting a synthetic nucleotide DNA duplex (5'- 
TAAGAATTCGGAGATCTGGCCTCAGCTCTAGAC (SEQ ID NO: 21), 
complementary oligo 5'- 

AATTGTCTAGAGCTGAGGCCAGATCTCCGAATTCTTAAT (SEQ ID NO: 
22)) between the Pac I and EcoR I sites of the pKA0127'kan' vector fragment. 

1 5 The 22 kb EcoR VBgl II fragment from cosmid pKOS079-93D containing the 

megAI-II genes was inserted into EcoR I and Bgl II sites of the resulting plasmid to 
generate pKOS024-84. A 12 kb Bgl WBbvC I fragment containing the megAHI 
and part of the megCIl gene was subcloned from pKOS079-93A and excised as a 
Bgl IVXba I fragment and ligated into the corresponding sites of pKOS024-84 to 

20 yield the final expression plasmid pKOS 1 08-06. 

The megosamine integrating vector, pKOS97-42, was constructed as 
follows: A subclone was generated containing the 4 kb Xho VSca I fragment from 
pKOS79-138B together with the 1 .7 kb Sea l/Pst I fragment from pKOS79-93D in 
Litmus 28 (Stratagene). The entire 5.7 kb fragment was then excised as a Spe VPst 

25 I fragment and combined with the 6.3 kb Pst VEcoR I fragment from KOS79-93D 
and EcoR VXba I digested pSET152 (Bierman et al. y 1992) to construct plasmid 
pKOS97-42. 

Production and analysis of secondary metabolites 
30 Fermentation for production of polyketide, LC/MS analysis, and 

quantification of 6-dEB for S. lividans K4-1 14/pKOS 108-6 and S. lividans K4- 
1 14/pKA0127'kan' were essentially as previously described (Xue et aL, 1999). S. 
erythraea NRRL2338 and 5. ery thraea/pKOS97 -42 were grown for 6 days in Fl 
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media (Briinker et aL, 1998). Samples of broth were clarified in a microcentrifuge 
(5 min, 13,000 rpm). For LC/MS preparation, isopropanol was added to the 
supernatant (1 :2 ratio) and centrifuged again. Erythromycins and megaiomicins 
were detected by electrospray mass spectrometry and quantity was determined by 
5 evaporative light scattering detection (ELSD). The LC retention time and mass 
spectra of erythromycin and megaiomicins were identical to known standards. 

Nucleotide sequence of the meg gene cluster 

A series of 4 overlapping inserts containing the meg cluster (Figure 9) were 

1 0 isolated from a cosmid library prepared from total genomic DNA of M. 

megalomicea and covers > 100 kb of the genome. A contiguous 48 kb segment 
which encodes the megalomicin PKS and several deoxysugar biosynthetic genes 
was sequenced and analyzed. The segment contains 17 complete ORFs as well as 
an incomplete ORF at each end, organized as shown in Figure 9. 

1 5 PKS genes. The ORFs megAI, megA II and megAIII encode the polyketide 

synthase responsible for synthesis of 6-dEB. The enzyme complex, meg DEBS, is 
highly similar to ery DEBS, with each of the three predicted polypeptides sharing 
an average of 83% overall similarity with their ery PKS counterpart. Both PKSs 
are composed of 6 modules (2 modules per polypeptide) and each module is 

20 organized in the identical manner (Figure 9). A dendrogram analysis (Schwecke et 
aL, 1995) employing 70 acyltranferase (AT) domains revealed that the 6 meg 
extender AT domains cluster with AT domains that incorporate methylmalonyl 
Co A (not shown). The loading module of meg DEBS also lacks a KS Q domain 
which is utilized by most macrolide PKSs for decarboxylation of the starter unit to 

25 initiate polyketide synthesis (Bisang et aL , 1 999; Kuhstoss et aL , 1 996; Kakavas et 
aL, 1997; Xue et aL, 1998), implying that priming begins with a propionate unit. 
In addition, a conserved Gly to Pro substitution in the NADPH-binding region of 
the ketoreductase (KR) domain of module 3 is observed in meg DEBS, which has 
been proposed to account for its inactivity in ery DEBS (Donadio et aL, 1991). 

30 Deoxysugar genes. BLAST (Altschul et aL , 1 990) analysis of the genes 

flanking the PKS indicated that 12 complete ORFs and 1 partial ORF appear to 
encode functions required for synthesis of one of the three megalomicin 
deoxysugars. Assignment of each ORF to a specific deoxysugar pathway was 
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made based on comparison to the ery genes and other related genes involved in 
deoxysugar biosynthesis (Table 2). 

Table 2. Deduced functions of genes identified in the megalomicin gene cluster. 



Gene 


Closest Match 


%Sim* 


Proposed 


Proposed Function 


Reference 




(polypeptide)* 




Pathway 






megT 


EryBV! 




Mycarose/ 


2,3-Dehydratase 


(Summers etai, 1997; 








Megosamine 




Gaisser etai, 1997) 


megDV/ 


EryCIi 


63 


Megosamine 


3,4-Isomerase 


(Summers etai, 1997) 


megDl 


EryCIlI 


79 


Megosamine 


G lycosyltransferase 


(Summers etai, 1997) 


megY 


AcyA (S. 


52 




Mycarose 0-acyl- 


(Arisawa etai, 1994) 




thermotolerans) 






transferase 




megDl/ 


EryCI 


58 


Megosamine 


Aminotransferase 


(Dhillone/o/., 1989; 












Summers et al, 1997) 


megDl// 


DesVI (5. 


61 


Megosamine 


Dimethyhransferase 


(Xue */*/., 1998) 




venezuelae) 










megD/V 


DmnU (S. 


65 


Megosamine 


3,5-Epimerase 


(Olano etai, 1999) 




peucetius) 










megDV 


Dehydrogenase 


61 


Megosamine 


4-Ketoreductase 


(Summers et al, 1997; van 




(A. orientalis) 








Wageningen et al., 1998) 


megDVU 


EryBII 


73 


Megosamine 


2,3-Reductase 


(Summers*/ al, 1997) 


megDV 


EryBV 


86 


Mycarose 


G lycosyltransferase 


(Summers*/ al, 1997; 












Gaisser et al., 1997) 


megB/V 


EryBIV 


80 


Mycarose 


4-Ketoreductase 


(Summers etai, 1997; 












Gaisser etai, 1997) 


megA/ 


EryAI 


81 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megA// 


EryAll 


85 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megA/// 


EryAlH 


83 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megCU 


EryCll 


82 


Desosamine 


3,4-Isomerase 


(Summers etai, 1997) 


meg C/l! 


EryCIH 


89 


Desosamine 


G lycosyly transferase 


(Summers etai, 1997) 


megB/i 


EryBII 


87 


Mycarose 


2,3-Reductase 


(Summers et al, 1997) 


megH 


EryH 


84 




Thioesterase 


(Haydock etai, 1991) 


megF 


EryF 






C-6 Hydroxylase 


(Weber etai, 1991) 



a. Determined by BLASTX analysis using default parameters. 
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Three ORFs, megBV, megCIJI and megDI, encode glycosyltransferases, 
apparently one for attachment of each deoxysugar to the macrolide. MegB V was 
most similar to EryBV, the erythromycin mycarosyltransferase, and hence was 
assigned to the mycarose pathway in the meg cluster. The closest match for both of 
5 the remaining glycosyltransferases was EryCIII, the desosaminyltransferase in 
erythromycin biosynthesis. Given the higher degree of similarity between EryCIII 
and MegCIII (Table 2), MegCIII was designated the desosaminyltransferase, 
leaving MegDI as the proposed megosaminyltransferase. In similar fashion, 
assignments were made accordingly for; MegCII and MegDVI, two putative 3,4- 

10 isomerases similar to EryCII; MegBII and MegDVII, 2,3-reductases homologous 
to EryBII; MegBIV and MegDV, putative 4-ketoreductases similar to EryBIV 
(Table 2). The remaining ORFs involved in deoxysugar biosynthesis, megT, 
megDJI, megDIHmd megDJV, each encode a putative 2,3-dehydratase, 
aminotransferase, dimethyltransferase and 3,5-epimerase, respectively (Table 2). 

1 5 Since both the megosamine and desosamine pathways require an aminotransferase 
and a dimethyltransferase, and since mycarose and megosamine each require a 
2,3-dehydratase and a 3,5-epimerase, assignments of these four genes to a specific 
pathway could not be made on the basis of sequence comparison alone. However, 
the latter three are implicated in megosamine biosynthesis by experiments 

20 described below. 

Other genes. Two additional complete ORPs, designated megY and megH 
and an incomplete ORF, designated megF, were also identified in the cluster. 
MegH and MegF share high degrees of similarity with EryH and EryF. EryH and 
homologs in other macrolide gene clusters are thioesterase-like proteins with 

25 unknown function in polyketide gene clusters (Haydock et al % 1991 ; Xue el ai 9 
1998; Butler et al, 1999; Tang et al> 1999). EryF encodes the erythronolide B C-6 
hydroxylase (Figure 8) (Weber etaL, 1991; Andersen and Hutchinson, 1992). 
MegY does not have an ery counterpart but appears to belong to a (small) family 
of O-acyltransferases that transfer short acyl chains to macrolides. Two classes 

30 exist: AcyA and MdmB transfer acetyl or propionyl groups to the C-3 hydroxyls 
on 16-membered macrolide rings (Arisawa et a/., 1994; Hara and Hutchinson, 
1992); CarE and Mpt transfer isovalerate or propionate to the mycarosyl moiety of 
carbomycin and midecamycin, respectively (Epp et al., 1 989; Arisawa et ah % 1993; 
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Gu et ai, 1996). The structures of various megalomicins suggest that MegY 
belongs to the latter class and is the acyltransferase which converts megalomicin A 
to megalomicins B, CI, or C2 (verified experimentally below). 

5 Heterologous expression of the meg PKS genes. 

The wild type and genetically modified versions of the ery DEBS have 
been used extensively in heterologous Streptomyces hosts for enzyme studies and 
the production of novel polyketide compounds. Given the similarities between the 
ery and meg DEBSs, production characteristics were compared in a commonly 

10 used Streptomyces host strain. The three meg A ORFs were cloned into the 

expression plasmid pKA0127'kan' (Ziermann and Betlach, 1999) in place of the 
ery A ORFs. Both plasmids, pKA0127'kan* encoding ery DEBS and pKOS 108-06 
encoding meg DEBS, were introduced in Streptomyces lividans K4-1 14 and the 
production of 6-dEB was determined in shake-flask fermentations. The production 

1 5 profiles were similar in both cases and the maximum titer of 6-dEB was between 
30-40 mg/L. In addition, both PKSs produced small amounts (-5%) of 8,8a- 
deoxyoleandolide, which results from the priming of the PKS with acetate instead 
of propionate (Kao et aL, 1994b). This observation indicates that the loading AT 
domains of the PKSs display similar relaxed specificities towards starter units. 

20 

Conversion of erythromycin to megalomicin in S. erythraea. 

An examination of the meg cluster revealed that the putative megosamine 
biosynthetic genes are clustered directly upstream of the PKS genes. If the 
hypothesis that these genes are sufficient for biosynthesis and attachment of 

25 megosamine to an erythromycin intermediate is correct, then functional expression 
of these genes in a strain which produces erythromycin, such as S. erythraea, 
should result in production of megalomicin. A 12 kb DNA fragment carrying all 
the genes between the leftmost Xhol site and the EcoJU site (Figure 9) was 
integrated in the chromosome of S. erythraea using the site-specific integrating 

30 vector pSETl 52 (Bierman et al , 1 992). It was surmised that the left and right ends 
of this fragment would contain necessary promoter regions for transcription of the 
convergent set of genes in M. megalomicea and that they would likely operate in 
S. erythraea. 
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Fermentation broth from S. ery//*raea/KOS97-42, which contains the 
integrated meg genes, was analyzed by LC/MS and compared to LC/MS profiles 
of the parent S. erythraea strain without the meg genes, as well as to megalomicin 
standards purified from M megalomicea. The new strain was found to produce a 

5 mixture of erythromycin A and various megalomicins (--4: 1 ratio), thereby 

showing that the predicted megosamine biosynthetic and glycosyltransferase genes 
are contained within the cloned meg fragment. The two most abundant congeners 
identified were megalomicins B and CI. Megalomicin A and C2 were also 
detected in smaller amounts. The presence of the megalomicins B, CI and C2 also 

10 provides direct evidence for the function of the O-acyl transferase, MegY, which 
is present in the integrated meg fragment. 



Discussion ' K 
The homologies observed among modular PKSs enabled the use of ery 
1 5 PKS genes to clone the meg biosynthetic gene cluster from M megalomicea. The 

close similarities between the megalomicin and erythromycin biosynthetic 

pathways is also reflected in the overall organization of their genes and in the high 

degree of homology of the corresponding individual gene-encoded polypeptides. 

Production of 6-dEB from meg DEBS in S. lividans and conversion of 
20 erythromycin to megalomicin using the megD genes in S. erythraea provides 

direct evidence that the identified gene cluster is responsible for synthesis of 

megalomicin. 

As seen in Figure 9, the ~- 40 kb segments of the two clusters beginning 
with ery/megBV on the left through the ery/megF genes retain a nearly identical 

25 organizational arrangement. The notable differences in this region are eryG and 
IS 1136 which are absent from the segment of the meg cluster analyzed. The eryG 
gene encodes an S-adenosylmethionine (SAM)-dependent mycarosyl 
methyltransferase that converts erythromycin C to erythromycin A (Figure 8) 
(Weber et ai, 1990; Haydock et al, 1991). The mycarose moiety is modified by 

30 esterification (MegY) in megalomicin biosynthesis (Figure 8) and, therefore, the 
absence of an eryG homolog would be expected in the meg cluster. The \S1136 
element located between eryAlzxsA eryAII (Donadio and Staver, 1993) is not 
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known to play a role in erythromycin biosynthesis and its origin in the ery cluster 
has not been determined. 

Upstream of the common meg/eryBIV and BV genes, the gene clusters 
diverge. The - 6 kb segment between ery BV and eryK, the left border of the ery 
5 gene cluster (Pereda et ai, 1997), contains the remaining genes required for 

mycarose (eryBVI and BVII) and desosamine biosynthesis (eryCIV, CK, and CVI) 
and the C-12 hydroxylase (eryK) (Stassi et ai, 1993). In contrast, the region 
upstream of megBV encodes a set of genes (megDI-DVIl and megY) which can 
account for all the activities unique to megalomicin biosynthesis (Figure 9). Since 

10 introduction of this meg DNA segment into £ eryihroea results in production of 
megalomicins, it is clear that these genes encode the functions for TDP- 
megosamine biosynthesis and transfer to its putative substrate erythromycin C, and 
to acylate megalomicin A (Figure 8). The remaining region upstream of megDVI 
should therefore encode genes only for mycarose and desosamine biosynthesis. 

1 5 Olano et aL (Olano et al , 1 999) have recently described a pathway for 

biosynthesis of TDP-L-daunosamine, a deoxysugar component of the antitumor 
compounds daunorubicin and doxorubicin produced by Streptomyces peucetius. 
Their pathway proposes four steps from the intermediate TDP-4-keto-6- 
deoxyglucose controlled by the gene cluster dnmJQTUVZ^ although the functions 

20 for dnmQ and dnmZ could not be identified and the precise order of reactions in 
the pathway could not be determined. The genes dnmT, dnmU, dnmJ and dnmV 
each have proposed counterparts in the meg cluster, megT, megDIV, megDII, and 
megDV, respectively (see Figure 10) 

It is possible to describe a pathway to convert TDP-2,6-dideoxy-3,4- 

25 diketo-D-hexose (or its enol tautomer), the last intermediate common to the 

mycarose and megosamine pathways, to TDP-megosamine through the sequence 
of 5-epimerization, 4-ketoreduction, 3-amination, and 3-N-dimethylation 
employing the genes megDIV, megDV t megDII, and megDUL This employs the 
same functions proposed for biosynthesis of TDP-daunosamine by Olano et ai % 

30 but in a different sequential order. However, it does not account for the megDVI 
and megDVII genes since their activities are not required for this route. A parallel 
pathway which employs these genes is also shown in Figure 10. In this alternate 
route, 2,3-reduction and 3,4-tautomerization are performed by the megDVII and 
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megDK/gene products, respectively. A unified single pathway that employs both 
4-ketoreduction (megDV) and 2,3-reduction (megDVII) could not be determined. 
Because the entire gene set from megDVI through megDVII was introduced in S. 
erythraea to produce TDP-megosamine, it is not possible to determine which, if 
5 either, of the two alternative pathways is operative, but this can be addressed 
through systematic gene disruption and complementation. 

The 48 kb segment sequenced also contains genes required for synthesis of 
TDP-L-mycarose and TDP-D-desosamine (Fig 10). For the latter, megCII, which 
encodes a putative 3,4-isomerase, the first step in the committed TDP-desosamine 

10 pathway, appears to be translationally coupled to megAIII, almost exactly as its 
erythromycin counterpart, eryCII, was found translationally coupled to eryAIII 
(Summers et aL, 1997). The high degree of similarity between MegCII and EryCII 
suggests that the pathway to desosamine in the megalomicin- and erythromycin- 
producing organisms are most likely the same. Similarly, the finding that megBII 

15 and megBIV, encoding a 2,3-reductase and 4-ketoreductase, contain close 

homologs in the mycarose pathway for erythromycin also suggests that TDP-L- 
mycarose synthesis in the two host organisms is the same. 

Of interest are the two genes that encode putative 2, 3 -reductases, megBII 
and megDVII. Because MegBII most closely resembles EryBII, a known mycarose 

20 biosynthetic enzyme (Weber et al 9 1990), and because megBII resides in the same 
location of the meg cluster as its counterpart in the ery cluster, megBII is assigned 
to the mycarose pathway and megDVII to the megosamine pathway. Furthermore, 
the lower degree of similarity between MegDVII and either EryBII or MegBII 
(Table 2) provides a basis for assigning the opposite L and D isomeric substrates 

25 to each of the enzymes (Figure 1 0). Finally, megT, which encodes a putative 2,3- 
dehydratase, is also related to a gene in the ery mycarose pathway, eryBVL In S. 
erythraea, the proposed intermediate generated by EryBVI represents the first 
committed step in the biosynthesis of mycarose (Figure 10). However, the 
proposed pathways in Figure 10 suggest this may be an intermediate common to 

30 both mycarose and megosamine biosynthesis in M. megalomicea. Therefore, megT 
is named following the designation of the equivalent gene in the daunosamine 
pathway, dnmT (Olano et al , 1 999) 
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The preferred host-vector system for expression of meg DEBS described 
here has been used previously for the heterologous expression of modular PKS 
genes from the erythromycin (Kao et ai, 1994a; Ziermann and Betlach, 1999), 
picromycin (Tang et ai, 1999) and oleandomycin pathways, as well as for the 
5 generation of novel polyketide backbones where domains have been removed, 
added or exchanged in various combinations (McDaniel et ai, 1999). Recently, 
hybrid polyketides have been generated through the co-expression of subunits 
from different PKS systems (Tang et ai, 2000). 

Expression of the megDVI-megDVIl segment in S. erythraea and the 

1 0 corresponding production of megalomicins in this host establishes the likely order 
of sugar attachment in megalomicin synthesis. Furthermore, it provides a means to 
produce megalomicin in a more genetically friendly host organism, leading to the 
creation of megalomicin analogs by manipulating the PKS. Over 60 6-dEB 
analogs have been produced by combinatorial biosynthesis using the ery PKS 

1 5 (McDaniel et ai , 1 999; Xue et ai , 1 999). The titers of megalomicin could also be 
significantly increased above the 5 mg/L obtained from M megalomiciea by 
introducing the genes into an industrially optimized strain of S. erythraea, many of 
which can produce as much as 10 g/L of erythromycin. 
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Example 2 

Stabilizing meg PKS Expression Plasmid bv Codon Engineering 

3 Materials and methods 

All bacterial strains were cultured and transformed as described in 
Example 1 . 
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Fermentation ofStreptomyces and diketide feeding 

Primary Streptomyces transformants were picked and placed in 6 mL of 
TSB liquid medium with 50 ng/L of thiostrepton and grown at 30°C. When the 
5 culture showed some growth (3-4days), it was transferred into a 250 mL flask 

containing 50 mL of R6 medium (pH 7.0) with 25 ug/L of thiostrepton and 1 g/L of 
diketide ((2s,3R)2-methyl-3-hydroxyhexanoate N-propionyl cysteamine thioester) 
and placed in a 30°C incubator for 7 days. 



1 0 Changing codons and making plasmids 

There are several identical sequences in the coding sequences for module 2 
and module 6 of the megalomicin PKS gene cluster. Expression plasmids 
* containing the full length megalomicin PKS appeared to be somewhat unstable"'" 
and subject to deletion in recA + strains like ET 124567 and Streptomyces by intra- 

1 5 plasmid homologous recombination. To prevent significant homologous 

recombination and so stabilize expression plasmids, the codons of two regions of 
the module 6 coding sequence that are identical to regions in the module 2 coding 
sequence were changed without changing the sequence of protein encoded. The 
two regions changed in module 6 were from the 26739 th base to 27,267 th base and 

20 from position 27,697 th base to 27,987 th base, which were identical to the region 
from position 6810 th base to7338 th base and regions from position 7778 th base to 
8068 th base, respectively. The start codon of the loading domain of the meg PKS 
was set to be the 1 st base. These sequences are shown below 



25 > 6810-7338 Sequence in Module 2 

TTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

30 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 23) 

35 > 26736-27267 Sequence in Module 6 

CTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

40 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
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TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 24) 
> 26736-27267 Sequence with Codon Changes 
5 CTGCAGCGCCTCTCCGTCGCCGTCCGCGAGGGCCGCCGAGTCCTCGGCGTCGTCGTCGGC 
TCGGCCGTCAACCAAGACGGCGCGTCAAACGGCCTCGCCGCGCCCTCCGGCGTCGCCCAG 
CAGCGCGTCATACGCCGCGCGTGGGGACGCGCCGGAGTATCGGGCGGCGACGTCGGAGTC 
GTCGAGGCCCACGGCACCGGCACCCGCCTCGGGGATCCCGTCGAGCTGGGCGCCCTCCTG 
GGCACGTACGGCGTCGGCCGCGGCGGCGTCGGCCCGGTCGTCGTCGGCAGCGTCAAGGCC 
10 AACGTCGGCCACGTCCAGGCCGCGGCCGGCGTCGTCGGGGTCATCAAGGTCGTCCTCGGC 
CTCGGCCGCGGGCTGGTCGGCCCGATGGTCTGCCGCGGCGGCCTCAGCGGCCTCGTCGAC 
TGGTCGTCCGGCGGCCTGGTCGTCGCGGACGGGGTCCGCGGCTGGCCGGTCGGCGTCGAC 
GGCGTCCGCCGGGGCGGCGTCTCGGCGTTCGGCGTCAGCGGGACGAAT (SEQ ID NO: 25) 



15 > 6978-7337 Sequence in Module 2 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 

20 TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
26) 

> 27697-27987 Sequence in Module 6 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
25 GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 
TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
27) 

> 27697-27987 Sequence with Codon Changes 

30 CGTGGAGTGCGATGCGGTCGTGTCGAGCGTCGTCGGCTTCAGCGTGCTGGGCGTCCTGGA 
GGGCCGCAGCGGCGCCCCGAGCCTGGACCGCGTCGACGTGGTCCAGCCGGTCCTGTTCGT 
GGTCATGGTCAGCCTGGCCCGCCTGTGGCGCTGGTGCGGCGTGGTCCCGGCCGCCGTGGT 
CGGCCACAGCCAGGGCGAGATCGCCGCCGCGGTCGTGGCCGGCGTCCTGAGCGTCGGCGA 
CGGCGCCCGCGTCGTGGCCCTGCGCGCCCGCGCCCTGCGCGCCCTGGCCGG (SEQ ID NO: 

35 28) 



Three pieces of DNA from the two regions above were synthesized and verified by 
Retrogen, and the synthesized DNAs were cloned into pCR-Blunt II -TOPO, as 
shown in the Table 3 below. 

40 



Table 3. Plasmids containing synthesized DNA 



Plasmids 


Cloning sites and positions in meg PKS 


pKOS97-1613 


Pstl-BamHI, 26,739 ,h -26,947 lh base 


PKOS97-1622 


BamHI-BsmI, 26,947 th -27,267 th base 


PKOS97-1628 


SfaNI-Fsel, 27,697 th - 27,987 th base 



Assembly of the expression plasmid 

First, ligation of the Pstl-BamHI fragment of pKOS97-1613, the BamHl- 
45 BsmI fragment of pKOS97- 1622 and Bsml-PstI linearized pKOS97-90 produced 
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pKOS97-151. Then, the insertion of the SfaNI-Fsel fragment of pKOS97-t628 
into pKOS97-151 gave rise to pKS097-152. Then, the Pstl-BlpI fragment of 
pKOS97-125 was used to replace the Pstl-BlpI fragment of pKOS97-90a and 
produced pKOS97-160. 
5 The final expression plasmid (in pRM5) pKOS97-162 was the result of 

Bglll-Nhel fragment of pKOS97-160 inserted into Bglll-Nhel sites of pKOS108- 
04. 

Another expression plasmid pKOS97-152a was made by a four-fragment 
ligation. The four fragments were a Blpl-Xbal fragment (containing a cos site) of 
10 pKOS97-92a, a Bglll-PstI fragment of pKOS97-81, a Pstl-BlpI fragment of 
pKOS97-152, and a Bglll-Xbal fragment of pKOS108-04 (as the vector). 

Tests of the constructed plasmids showed that the plasmids containing the 
modified coding sequences were more stable than plasmids containing unmodified 
coding sequence. 

15 

Example 3 
Construction of Ole-Meg Hybrid PKS 
Construction of pRMl -based pKOS098-48 for the expression of OlePKS modules 
1-4. 

20 The 240-bp fragment containing the 3 '-end portion of oleAII gene (at nt 

1 1210-1 1452; the first base of the start codon of oleAII is nt 1) was PCR amplified 
with primers N98-38-1 (5 'GAAC AACTCCTGTCTGCGGCCGCG-3 ') (SEQ ID 
NO: 29) andN98-38-3 (5'- 

CGGAATTCTCTAGAGTCACGTCTCCAACCGCTTGTCGAGG-3') (SEQ ID 
25 NO: 30). The fragment contains a naturally occurring NotI site at its 5'-end and 
the engineered Xbal (bold) and EcoRI sites (underline) at its 3'-end following the 
oleAII stop codon. pKOS38-189 was digested with EcoRI and NotI to give five 
fragments of 8 kb, 5 kb, 4 kb, 2.5 kb and 2 kb. The 8-kb EcoRl-NotI fragment 
containing ole A II gene nt 2961 to nt 1 1210 and the 240-bp NotI, EcoRI treated 
30 PCR fragment were ligated into litmus 28 at the EcoRI site via a three-fragment 
ligation to give pKOS98-46. The 8.2-kb EoRI fragment from pKOS98-46 was 
cloned into pKOS38-174, a pRMl derived plasmid containing oleAl and nt 1 to nt 
2960 of oleAII to give pKOS98-48. 

100 



WO 01/27284 PCT/US00/27433 



Construction of pSETI52-based pKOS98-60 for the expression of megPKS 
modules 5-6. 

The 360-bp fragment containing nt 1 to nt 366 of megAIII was PCR 
5 amplified with primers N98-40-3 (5 

TCTAG AC TTA ATTA A GG AGG ACAG4 TA rGAGCGA-G AGC AGC- 
GGCATGACCG^) (SEQ ID NO: 31) and N98-40-2 (5'- AACGCCTCCCAG- 
GAGATCTCCAGC A-3 ') (SEQ ID NO: 32). A Pad site and a Ndel site as well 
as the ribosome binding site were introduced at the 5'-end of the megAI start 

10 codon. The 360-bp Pacl-Bglll fragment was inserted into pKOS 108-06 replacing 
the 22-kb Pacl-Bgtll fragment to yield pKOS98-55. The 1 0-kb Pacl-Xbal 
fragment containing megAIII gene and the annealed oligos N98-23-1 (5'- 
AATTCATAGCCTAGGT-3') (SEQ ID NO: 33) and N98-23-2 (5'- 
CTAG ACCTAGGCTATG-3 ') (SEQ ID NO: 34) were ligated to Pad and EcoRI 

1 5 treated pSETl 52 derivative pKOS98-l 4 via a three-fragment ligation to give 
pKOS98-60. 

Example 4 

Conversion of Ervthronolides to Erythromycins 
20 A sample of a polyketide (-50 to 100 mg) is dissolved in 0.6 mL of 

ethanol and diluted to 3 mL with sterile water. This solution is used to overlay a 
three day old culture of Saccharopolyspora eryihraea WHM34 (an eryA mutant) 
grown on a 100 mm R2YE agar plate at 30°C. After drying, the plate is incubated 
at 30°C for four days. The agar is chopped and then extracted three times with 1 00 
25 mL portions of 1% triethylamine in ethyl acetate. The extracts are combined and 
evaporated. The crude product is purified by preparative HPLC (C-18 reversed 
phase, water-acetonitrile gradient containing 1% acetic acid). Fractions are 
analyzed by mass spectrometry, and those containing pure compound are pooled, 
neutralized with triethylamine, and evaporated to a syrup. The syrup is dissolved 
30 in water and extracted three times with equal volumes of ethyl acetate. The 
organic extracts are combined, washed once with saturated aqueous NaHC0 3> 
dried overNa2S04, filtered, and evaporated to yield -0.15 mg of product. The 
product is a glycosylated and hydroxylated compound corresponding to 
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erythromycin A, B, C, and D but differing therefrom as the compound provided 
differed from 6-dEB. 

Example 5 

5 Measurement of Antibacterial Activity 

Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strains of 
Staphylococcus pneumoniae, 

10 

Example 6 
Evaluation of Antiparasitic Activity 
36 Compounds can initially screened in vitro using cultures of P. falciparum 

FCR-3 and Kl strains, then in vivo using mice infected with P. berghei. Mammalian 
15 cell toxicity can be determined in FM3A or KB cells. Compounds can also be 

screened for activity against P. berhei. Compounds are also tested in animal studies 
and clinical trials to test the antiparasitic activity broadly (antimalarial, 
trypanosomiasis and Leishmaniasis). 

20 The invention having now been described by way of written description 

and example, those of skill in the art will recognize that the invention can be 
practiced in a variety of embodiments and that the foregoing description and 
examples are for purposes of illustration and not limitation of the following 
claims. 

25 
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Claims 

1 . An isolated nucleic acid comprising a nucleotide sequence 
encoding a domain of megalomicin polyketide synthase (PKS) or a megalomicin 
modification enzyme. 



reading frame (ORF) selected from the group consisting of megAl, megAII and 
megAIH. 

3. The isolated nucleic acid of claim 1, wherein the PKS domain is 
selected from the group consisting of a TE domain, a KS domain, an AT domain, 
an ACP domain, a KR domain, a DH domain, and an ER domain. 

4. The isolated nucleic acid of claim 1, wherein the nucleic acid 
comprises the coding sequence for a loading module, a thioesterase domain, and 
all six extender modules of megalomicin PKS. 

5. The isolated nucleic acid of claim 1 , which encodes a megalomicin 
modification enzyme that is involved in the conversion of 6-dEB into a 
megalomicin. 

6. The isolated nucleic acid of claim 5, which encodes a megalomicin 
modification enzyme that is involved in the biosynthesis of mycarose, 
megosamine or desosamine. 

7. The isolated nucleic acid of claim 1 , wherein the nucleic acid 
codons of homologous regions within the PKS or the megalomicin modification 
enzyme coding sequence have been changed to reduce or abolish the homology 
without changing the amino acid sequences encoded by said changed nucleic acid 
codons. 



The isolated nucleic acid of claim 1 , which encodes a PKS open 
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8. The isolated nucleic acid of claim 1, which isolated nucleic acid 
fragment hybridizes to a nucleic acid having a nucleotide sequence set forth in the 
SEQ. IDNO:l. 

5 9. A polypeptide, which is encoded by the isolated nucleic acid 

fragment of claim 1. 

10. A recombinant DNA expression vector, comprising the isolated 
nucleic acid of claim 1 operably linked to a promoter. 

10 

11. A recombinant host cell, comprising the recombinant DNA 
expression vector of claim 10. 

12. The recombinant host cell of claim 1 1 , which is a Streptomyces or 
15 Saccharopolyspora host cell. 

1 3. A recombinant host cell of claim 11, which comprises: 

a) at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 

20 encoding a megalomicin PICS domain or a megalomicin modification enzyme 
operably linked to a promoter; or 

b) at least one autonomously replicating recombinant DNA expression 
vector and at least one modified chromosome, each of said vectors) and each of 
said modified chromosome comprises a recombinant DNA compound encoding a 

25 megalomicin PKS domain or a megalomicin modification enzyme operably linked 
to a promoter. 

14. A hybrid PKS that comprises a polypeptide of claim 9 and is 
composed of at least a portion of a megalomicin PKS and at least a portion of a 

30 second PKS for a polyketide other than megalomicin. 
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1 5. The hybrid PKS of claim 14, wherein the second PKS is selected 
from the group consisting of a narbonolide PKS, an oleandolide PKS, and a DEBS 
PKS. 

5 1 6. The hybrid PKS of claim 1 5 that is composed of the megAl and 

megAU gene products and the oleAJII gene product. 

1 7. The hybrid PKS of claim 1 6, wherein the KS domain of module 1 
of the megAI gene product has been inactivated by mutation. 

10 

1 8. A method of producing a polyketide, which method comprises 
growing the recombinant host cell of claim 1 1 under conditions whereby the 
megalomicin PKS domain encoded by the recombinant expression vector is 
produced and the polyketide is synthesized by the cell, and recovering the 

15 synthesized polyketide. 

19. A recombinant host cell that comprises a recombinant expression 
vector that encodes a megalomicin modification enzyme. 

20 20. The recombinant host cell of claim 1 9 that produces megosamine 

and can attach megosamine to a polyketide, wherein said host cell, in its naturally 
occurring non-recombinant state cannot produce megosamine. 
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& ®* azithromycin 

Megaloaicin A H H (Azithromax) 

B COCH3 H 

CI COCH3 COCH 3 

C2 COCH 2 CH 3 COCH3 



O 




Erythromycin A 



Structures of the Megalomicins and Azithromycin 
Figure 3 
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module 6 



AT = acyftransf erase 
ACP = acyl carrier protein 
KS = ketosynthase 
KR s ketoreductase 
DH = dehydratase 
ER c enoy! reductase 
TEsthioesterase 
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ACF 
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AT 
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6-deoxyerythronolfcfe B 




Biosynthesis of 6-Deoxyerythronolide B (6-dEB), the Aglycone of Erythromycin, by a 

Modular PKS 

Figure 4 



4/30 



WO 01/27284 



PCT/US00/27433 



OEBS enzymes I S 




Megalomicin A 



'OTVP 

TOP-rtiodosamine (8) 



CH, 

Erythromycin A 



TDP-daunosamine (7) 



C-2 deoxygQnation 



D-8lucose-1-OPO s TDP-D-g!ueose{1) TDP-4-keto- 

6-deoxy-D-gIucose (2) 



3 



Erythromycin Biosynthetic Pathway and Megalomicin Biosynthesis 

Figure 5 
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REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
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source 



gene 
CDS 



gene 
CDS 



gene 
CDS 



gene 



1 47981 bp DNA 01-MAY-2000 

Megalomicin bio synthetic gene cluster, polyketide synthase, 
desosamine, megosamine, and mycarose biosynthesis genes. 
1 



Micromonospora megalomicea. 
Micromonospora megalomicea 
Unclassified. 

1 (bases 1 to 47981) 

Volchegursky, Y. , Hu,Z., Katz,L. and McDaniel,R. 

Biosynthesis of the Anti-Parasitic Agent Megalomicin: 

Transformation of Erythromycin to Megalomicin in Saccharopolyspora 

erythraea 

Unpublished 

2 (bases 1 to 47981) 
McDaniel,R. and Volchegur sky , Y . 
Direct Submission 

Submitted (01-MAY-2000) Kosan Biosciences, Inc., 3828 Bay Center 
Place, Hayward, CA 94545, USA 

Location/Qualifiers, 

1. .47981 

/organisms "Micromonospora megalomicea" 

/strains«NRRL3275" 

/sub_species= "nigra" 

complement ( <1 . . 144 ) 

/gene="megT" 

complement (<1. .144) 

/gene="megT" 

/codon_start«l 

/transl_table=ll 

/products«TDP-4-keto-6-deoxyglucose-2,3-dehydratase" 

/ translations "MGDRVNGHATPESTQSAIRFLTRHGGPPTATDDVHDWLAHRAAE 

CRLE" (SEQ ID NO: 2) 

>28..2061 
/genea "megDVl " 
928. .2061 
/gene= "megDVI n 
/codon_start=l 
/ trans l_tablesii 

/product="TDP-4-keto-6-deoxyhexose 3 , 4-isomerase M 

/translation^" MAVGDRRRLGRELQMARGL YWGFGANGDL YSMLLSGRDDDPWTW 

YERLRAAGRGPYASRAGTWWGDHRTAAEVLADPGFTHGPPDAARWMQVAHCPAASWA 

GP FREFYARTEDAAS VTVDADWLQQRCARLVTELGS RFDLVNDFAREVP VLALGTAPA 

LKGVDPDRLRS WTS ATRVCIiDAQVS PQQLAVTEQ ALTALDE I DAVTGGRDAAVLVGW 

AELAANTVGNAVLAVTELPELAARLADDPETATRVVTEVSRTSPGVHLERRT^ 

VGGVDVPTGGEVTVVVAAANRDPEVFTDPDRFDVDRGGDAEILS SRPGS PRTDLDALV 

ATLATAALRAAAPVLPRLSRSGPVIRRRRSPVARGLSRCPVEL" (SEQ ID NO: 3) 

2072.. 3382 

/gene=» "megDI" 

2072. .3382 

/gene= "megDI" 

/codon_start»l 

/transl_table=ll 

/produc t s "TDP -megosamine glycosyl trans f erase " 

/ translations "MRWFSSMAVNSHLFGLVPLASAFQAAGHEVRWASPALTDDVT 

GAGLTAVPVGDDVELVEWHAHAGQDIVEYMRTLDWVDQSHTTMSWDDLLGMQTTFTPT 

FFALMSPDSLIDGMVEFCRSWRPDWIVWEPLTFAAPIAARVTGTPHARMLWGPDVATR 

ARQSFLRLLAHQEVEHREDPLAEWFDWTLRRFGDDPHLSroEELVLGKJWTVDPIPEPL 

RI DTGVRTVGMRYVPYNGPS WPAWLLREPERRRVCLTLGGS SREHG IGQVS IGEMLD 

Al ADIDAE FVATFDDQQLVGVGS VPANVRTAGFVPMNVLL PTCAATVHHGGTGS WLTA 

AIHGVPQIILSDADTEVHAKQLQDLGAGLSLPVAGMTAEHLRGAIERVLDEPAYRLGA 

ERMRDGMRTDPS PAQWG ICQDLAADRAARGRQ PRRTAE PHLPR " (SEQ ID NO: 4) 

3462.-4634 
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/gene="megY" 
CDS 3462.-4634 

/gene="megY" 

/codon_start=l 

/transl_table=ll 

/products "mycarose O-acyltransf erase" 

/trans lat ion= "MVTSTNLDTTARPALNSLTGMRFVAAFLVFFTHVLSRLIPNSYV 
YADGLDAFWQTTGRVGVSFFFILSGFVLTWSARASDSVWSFWRRRVCKIjFPNHLVTAF 
AAWLFLVTGQAVSGEALIPNLLLIHAWFPALBISFGINPVSWSLACEAFFYLCFPLF 
LFWISGIRPERLWAWAAWFAAIWAVPWADLLLPSSPPLIPGLEYSAIQDWFLYTFP 
ATRSLEFILGIILARIIiITGRWINVGLLPAVLLFPVFFVASLFLPGVYAISSSMMILP 
LVLI I ASGATADLQQKRTFMRNRVMVWLGDVS FALYMVHFLVI VYGADLLGFSQTEDA 
• PLGLALFM 1 1 PFLAVSLVLS WLIjYRFVELPVMRNWARPAS ARRKPATEPEQTPSRR " 

gene 4651.. 5775 (SEQ ID NO: 5) 

/genes"megDII w 

CDS 4651.. 5775 

/genes "megDII" 
/codon_start=l 
/transl_table=ll 

/product="TDP-3-keto-6-deoxyhexose 3-aminotransaminase" 
/translation="MTTYVWSYLLEYERERADILDAVQKVFASGSLILGQSVENFETE 
YARYHG I AHCVGVDNGTNAVKLALE SVGVGRDDE WTVSNTAAPTVIjAIDE IGARPVF 
VDVRDEDYLMDTDLVEAAVTPRTKAIVPVHLYGQCVDMTALRELADRRGLKLVEDCAQ 
AHGARRDGRLAGTMSDAAAFSFYPTKVLGAYGDGGAVVTNDDETARALRRLRYYGMEE 
VY YVTRTPGHNSRLDE VQAE I LRRKLTRLDAYVAGRRAVAQRYVDGLADLQDSHGLEL 
P WTDGNEHVF YVYWRHPRRDE 1 1 KRLRDGYD I S LNI S YP WP VHTMTGF AHLCTASG 
SLPVTERLAGEIFSLPMYPSLPHDLQDRVIEAVREVITGL" ( SE Q ID N0: 6 ) 

gene 5822.. 6595 

/gene = 11 megDI II n 

CDS 5822.. 6595 

/gene= " megDI I I " 
/ codon_s t ar t = 1 
/transl_table=ll 

/products"daunosaminyl-N,N-dimethyltransf erase" 

/ trans lat ion= "MPNSHSTTSSTDVAPYERADIYHDFYHGRGKGYRAEADALVEVA 

RKHTPQAATLLDVACGTGSHLVELADSFREWGVDLSAAMLATAARNDPGRELHQGDM 

RDFSLDRRFDWTCMFSSTGYLVDEAELDRAVANLAGHLAPGGTLWEPWWFPETFRP 

GWGADLVTSGDRRISRMSHTVPAGLPDRTASRMTIHYTVGSPEAGIEHFTEVHVMTL 

FARAAYEQAFQRAGLSCSYVGHDLFSPGLFVGVAAEPGR" (SEQ ID NO: 7) 

gene 6592.. 7197 

/gene= " megDIV" 

CDS 6592.. 7197 

/gene= M megDIV" 
/codon_start=l 
/transl_tablesii 

/produc t « " TDP - 4 - ke to - 6 - deoxyhexos e 3 , 5 - ep i me ra s e " 

/ trans lat ion= w MR VEELGIEGVFTFTPQTFADERGVFGTAYQEDVFVAAIjGRPLF 

PVAQVSTTRSRRGVVRGVHFTTMPGSMAKYVYCARGRAMDFAVDIRPGSPTFGRAEPV 

ELSAESMVGLYLPVGMGHLFVSLEDDTTLVYLMSAGYVPDKERAVHPLDPELALPIPA 

DLDLVMS ERDRVAPTLREARDQG I LPD YAACRAAAHRWRT " (SEQ ID NO: 8) 

gene 7220.. 8206 

/gene= "megDV" 

CDS 7220.. 8206 

/genes "megDV" 

/codon_startsi 

/transl_tablesll 

/produc t = B TDP - 4 - ke t o - 6 - deoxyhexos e 4 - ke toreduc t as e n 
/translati on= " MWLGASGFLGSAVTHALADLPVRVRLVARREVWPSGAVADYE 
THR VDLTEPGALAE WADARAVFPFAAQ IRGTSG WR ISEDD WAERTNVGLVRDL IAV 
LSRSPHAPWVFPGSNTQVGRVTAGRVIDGSEQDHPEGVYDRQKHTGEQLLKEATAAG 
AI RATSLRLPP VFGVPAAGTADDRG VVSTM I RRALTGQPLTMWHDGTVRRELLYVTDA 
ARAFVTAIiDHADALAGRHFLLGTGRSWPLGEVFQAVSRSVARHTGEDPVPVVSVPPPA 
HMDPSDLRSVEVDPARFTAVTGWRATVTMAEAVDRTVAALAPRRAAAPSEPS 0 
gene complement (8228. .9220) (SEQ ID NO: 9) 
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/gene="megDVII" 
complement (8228. .9220) 
/gene= n megDVII" 
/codon_start=l 
/ 1 r ans T_t abl e » 1 1 

/pr odu c t= " TDP - 4 - ke t o - 6 - deoxyhexos e 2,3- r educ t a s e " 

/trans lation= "MGTTGAGSARVRVGRSALHTSRLWLGTVNFSGRVTDDDALRLMD 

HALERG VNC I DTAD I YG WRLYKGHTEELVGRWFAQGGGRREETVLATKVGS EMS ERVN 

DGGLSARHIVAACENSLRRLGVDHIDIYQTHHIDRAAPWDEVWQAAEHLVGSGKVGYV 

GSSNLAGWHIAAAQESAARRNLLGMISHQCLYNLAVRHPELDVLPAAQAYGVGVFAWS 

PLHGGLLSGVLEKLAAGTAVKSAQGRAQVLLPAVRPLVEAYEDYCRRLGADPAEVGLA 

WVXjSRPGILGAVIGPRTPEQLDSALRAAELTLGEEELRELEAIFPAPAVDGPVP" 

complement (9226. .10479) (SEQ ID NO: 10) 

/gene="megBV" 

complement (9226. .10479) 

/geneo"megBV" 

/codon_start=l 

/transl_table=ll 

/product^ "TDP-mycarose glycosyl transferase" 

/ trans la tion= "MRVLLTSFAHRTHFQGLVPLAWALHTAGHDVRVASQPELTDVW 

GAGLTSVPLGSDHRLFDISPEAAAQVHRYTTDLDFARRGPELRSWEFLHGIEEATSRF 

VFPVVNlTOSFVDEL^FAMDWRPDLVLWEPFTFAGAVAAKACGAAHARIiLWGSDLTGY 

FRSRSQDLRGQRPADDRPDPLGGWLTEVAGRFGLDYSEDLAVGQWSVDQLPESFRLET 

GLESVHTRTLPYNGSSWPQWLRTSDGVRRVCFTGGYSALGITSNPQEFLRTLATLAR 

FDGEI VVTRSGtDPAS VPDNVRLVDFVPMNI LLPGCAAVIHHGGAGS WATALHHGVPQ . 

ISVAHEWDCVLRGQRTAELGAGVFLRPDEVDADTLWQALATVVEDRSHAENAEKLRQE .v 

ALAAPTPAEWPVLEALAHQHRADR w (SEQ ID NO: 11) 

complement (104 83. .11424) 

/gene^megBIV"* 

complement (10483. .11424) 

/gene="megBIV M 

/codon_start«l 

/transl_table=ll 

/product="TDP-4-keto-6-deoxyhexose 4-ketoreductase" 

/ trans lat ion= "MTRHVTLLGVSGFVGSALLREFTTHPLRLRAVARTGSRDQPPGS 

AG I EHLRVDLLE PGRVAQ WADTDVWHLVAYAAGGSTWRS AATVPEAERVNAG I MRD 

LVAALRARPG PAPVLLFASTTQAANPAAPSRYAQHKI EAERI LRQATEDGWDGVI LR 

LPAIYGHSGPSGQTGRGWTAMIRRALAGEPITMWHEGSVRRMLLHVEDVATAFTAAIi 

HNHEALVGDVWTPSADEARPLGEIFETVAASVARQTGNPAVPWSVPPPENAEANDFR 

SDDFDSTEFRTLTGWHPRVPLAEGIDRTVAALISTKE" (SEQ ID NO: 12) 

12181. .22821 

/gene="megAI" 

12181.. 22821 

/gene= w megAI" 

/note= n polyketide synthase" 

/codon_start=l 

/transl_table=ll 

/product^ "megalomicin 6-deoxyerythronolide B synthase 1" 

/ trans la t ion= "MVDVPDLLGTRTPHPGPIiPFPWPLCGHNEPELRARARQLHAYLE 

G I SEDD WAVGAALARETRAQDGPHRAVWAS S VTELTAALAALAQGRPHPS WRGVA 

RPTAPWFVLPGQGAQWPGMATRLLAESPVFAAAMRACERAFDEVTDWSLTEVLDSPE 

HLRRVEWQPALFAVQTSLAALWRSFGVRPDAVLGHSIGELAAAEVCGAVDVEAAARA 

AALWSREMVPLVGRGD^^VALSPAELAARVERWDDDVVPAGVNGPRSVLLTGAPEPI 

ARRVAEIiAAQGVRAQVVNVSMAAHSAQVDAVAEGMRSALTO 

RLDTRELGADHW PRS FRLP VRFDEATRAVLELQPGTF I ES S PHPVLAAS LQQTLDEVG 
SPAAIVPTLQRDQGGLRRFLl^VAQAYTGGVTVDWTAAYPGVTPGHLPSAVAVETDEG 
PSTEFDWAAPDHVLRARLLEIVGAETAALAGREVDARATFRELGLDSVLAVQLRTRLA 
TATGRDLH I AMLYDHPT PHALTEALLRGPQEE PGRGEETAHPTEAEPDE PVAWAMAC 
RLPGGVTSPEEFWELIiAEGRDAVGGLPTDRGWDLDSLFHPDPTRSGTAHQRAGGFLTG 
ATSFDAAFFGLS PREALAVEPQQR1TLELS WEVLERAGI PPTSLRTSRTGVFVGLI PQ 
E YG PRLAEGGEGVEG YLMTGTTTS VASGRVAYTLGLEGPAI S VDTACS S SLVAVHLAC 
QSLRRGESTMALAGGVTVMPTPGMLVDFSRMNSLAPDGRSKAFSAAADGFGMAEGAGM 
LLLERLSDARRHGHPVLAVIRGTAVNSDGASNGLSAPNGRAQVRVIRQALAESGLTPH 
TVDWETHGTGTRLGDPIEARALSDAYGGDREHPLRIGSVKSNIGHTQAAAGVAGLrK 
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misc_f eature 
misc_feature 
misc_feature 
misc_feature 
mis cofeature 
misc feature 



LVLAMQAGVLPRTLHADEPSPEIDV^SSGAISLLQEPAAWPAGERPRRAGVSSFGISGT 

NAHAIIEEAPPTGDDTRPDRMGPWPVaO,SASTGEALRARAARLAGHLREHPDQDLDD 

VAYSLATGRAALAYRSGFVPADASTALRILDELAAGGSGDAVTGTARAPQRWFVFPG 

QGWQWAGMAVDLLDGDPVFASVLRECADALEPYLDFEIVPFLRAEAQRRTPDHTLSTD 

RVD WQPVLFAVMVS LAARWRAYGVE P AAVIGHS QGE I AAAC VAGALS LDDAARAVAL 

RSRVI ATMPGNGAMAS I AASVDEVAARIDGRVE I AAVNGPRAVWSGDRDDLDRLVAS 

CIVEGVRAKRLPWYASHSSHVEAVRDALHAELGEFRPLPGFVPFYSTVTGRW^PAE 

LDAGYWFRNLRHRVRFADAVRSLADQGYTTFLEVSAHPVLTTAIEEIGEDRGGDLVAV 

HSLRRGAGGPVDFGSALARAFVAGVAVDWESAYQGAGARRVPLPTYPFQRERFWLEPN 

P ARRVADSDD VS S LRYR I E WHPTDPGE PGRLDGTWLLATYPGRADDRVEAARQ ALES A 

GARVEDIiWEPRTGRVDLVRRLDAVGPVAGVLCLFAVAEPAAEHSPLAVTSLSDTLDL 

TQAVAGSGRECPIWWTENAVAVGPFERLRDPAHGALWALGRWALENPAVWGGLVDV 

PSGSVAELSRHLGTTLSGAGEDQVALRPDGTYARRWCRAGAGGTGRWQPRGTVLVTGG 

TGGVGRHVARWLARQGTPCLVLASRRGPDADGVEELLTELADLGTRATVTACDVTDRE " 

QLRALLATVDDEHPLSAVFHVAATLDDGTVETLTGDRIERANRAKVLGARNLHELTRD 

ADLDAFVLFS SSTAAFG APGLGG YVPGNA YLDGLAQQRRS EGLPATS VAWGTWAGSGM 

AEGPVADRFRRHGVMEMHPDQAVEGLRVALVQGEVAPIWDIRWDRFLLAYTAQRPTR 

LFDTLDE ARRAAPG PDAGPG VAALAGLPVGERE KAVLDLVRTHAAAVLGHAS AEQVPV 

DRAFAELGVDSLSALELRNRLTTATGVRLATTTVFDHPDVRTLAGHLAAELGGGSGRE 

RPGGEAPTVAPTDEPIAIVGMACRLPGGVDSPEQLWELIVSGRDTASAAPGDRSWDPA 

ELMVSDTTGTRTAFGNFMPGAGEFDA-AFFGISPREAIjAMDPQQRHALETTWEALENAG 

IRPESLRGTDTGVFVGMSHQGYATGRPKPEDEVDGYLLTGNTASVASGRIAYVLGLEG 

PAITVDTACSSSLVALHVAAGSLRSGDCGLAVAGGVSVMAGPEVFREFSRQGALAPDG 

RCKPFSDEADGFGLGEGSAFWLQRLSVAVREGRRVLGVWGSAVNQDGASNGLAAPS 

GVAQQRV I RRAWGRAG VSGGD VG WEAHGTGTRLGDP VEXjGALLGTYGVGRGGVGP W 

vgsvkanvghvqaaagwgvikvvlglgrglvgpmvcrgglsglvdwssgglvvadgv 
rgwpvgvdgvrrggvsafgvsgtnahvwaeapgswgaerpvegssrglvgwggw 
pvvlsaktetalhaqarrladhlethpdvpmtdvvwtltqarqrfdrravllaadrtq 
averlrglu^ggepgtgwsgvasgggv^fvfpgqggqwgmargllsvpvfvesvvec 
dawsswgfsvlgvlegrsgapsldrvdwqpvlfwmvslarlwrwcgwpaawg 
hsqgeiaaawagvlsvgdgarwalraralralaghggmasvrrgrddvqklldsgp 
wtgkleiaavngpdavwsgdpravtelvehcdgigvrartipvdyashsaqveslre 
ellsvlagiegrpatvpfystltggfvdgteldadywyrnlrhpvrfhaavealaard 
lttfvevsphpvlsmavgetladvesavtvgtlerdtddverfltslaeahvhgvpvd 
waavlgsgtlvdlptypfqgrrfwlhpdrgprddvadwfhrvdwtatatdgsarldgr 
wlvwpegytddgwwevraalaaggaepwttveevtdrvgdsdawsmlgladdga 
aetlallrrldaqasttplwwtvgavapag p vqrpeqatvwglalvas lerghrwtg 
lldlpqtpdpqlrprlvealagaedqvavradavharrivptpvtgagpytapggtil 
vtggtaglgavtarwlaergaehlalvs rrg pgtag vde wrdltglgvrvs vhs cdv 
gdresvgalvqeltaagdvvrgvvhaaglpqqvpltdmdpadladwavkvrtjgavhla 
dlcpeaelfllfssgagvwgsarqgayaagnafldafarhrrdrglpatsvawglwaa 
ggmtgdqeavsflrergvrpmsvpralealervltagetavvvadvdwaafaesytsa 
rprpllhrlvtpaaavgerdepreqtlrdrlaalpraersaelvrlvrrdaaavlgsd 

AKAVPATTP FKDLG FDS LAAVRFRNRLAAHTGLRLPATLVFEHPNAAAVADLLHDRLG 

EAGE PTPVRS VG AGLAALEQAL PDAS DTERVELVERLERMLAGLRPEAGAGADAPTAG 

DDLGEAGVDELLDALERELDAR " (SEQ ID NO: 13) 

1250S-. 13470 

/gene="megAI n 

/ f unc t ion= " AT - L B 

13576. .13791 

/gene= M megAI " 

/functions "ACP-L" 

13849. .15126 

/gene» u megAI " 

/function^ "KS1" 

15427. .16476 

/gene= M megAI " 

/ function^ "ATI" 

17155. .17694 

/geneo w megAI" 

/function= n KRl a 

17947. .18207 

/gene = " meg AI n 

/function= M ACPl" 
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misc_f eature 18268 . . 19548 
/gene= "megAI* 
/functions»KS2" 

misc_feature 19876. .20910 
/genes'»megAI M 
/functions"AT2» 

misc_feature 21517. .22053 
/genes "megAl" 
/ functions "KR2" 

misc_feature 22318. .22575 
/genes "megAI" 
/function="ACP2" 

gene 22867.. 33555 



/gene= n megAI i " 
/note= M polyketide synthase" 
/ codon_s t ar t = 1 
/transl_table»ll 

/product" "megalomicin 6-deoxyerythronolide B synthase 2" 

/ translations "MTDNDKVAEYLRRATLDLRAARKRLRELQSDPIAWGMACRLPG 

G VHL PQHLWDLLRQGHETVS TFPTGRGWDLAGLFHPDPDHPGTS YVDRGGFLDDVAGF 

DAEFFGISPREATAMDPQQRLLLETSWELVESAGIDPHSLRGTPTGVFLGVARLGYGE 

NGTEAGDAEGYSVTGVAPAVASGRIS YALGLEGPS I SVDTACS S SLVALHLAVESLRL 

GESSLAWGGAAVMATPGVFVDFSRQRALAADGRSKAFGAAADGFGFSEGVSLVLLER 

LSEAESNGHEVLAVIRGSALNQDGASNGLAAPNGTAQRKVIRQALRNCGLTPADVDAV 

EAHGTGTTLGDPIEANALLDTYGRDRDPDHPLWLGSVKSNIGHTQAAAGVTGLLKMVL 

ALRHEE LPATLHVDEPT PHVDWS SGAVRLATRGRPWRRGDRPRRAGVS AFG I SGTNAH 

VIVEEAPERTTERTVGGDVGPVPLWSARSAAALRAQAAQVAELVEGSDVGLAEVGRS 

IiAVTRARHEHRAAWASTRAEAVRGIiREVAAVEPRGEDTVTGVAETSGRTVVFLFPGQ 

GSQWVGMGAELLDSAPAFADTIRACDEAMAPLQDWSVSDVLRQEPGAPGLDRVDWQP 

VLFAVMVSLARLWQS YGVTPAAWGHSQGE I AAAHVAGALS LADAARLWGRS RLtiRS 

LSGGGGMSAVALGEAEVRRRLRS WEDR I S VAAVNGPRS VWAGEPEALREWGREREAE 

GVRVREIDVDYASHSPQIDRVRDELLTVTGEIEPRSAEITFYSTVDVRAVDGTDLDAG 

YWYRNLRETVRFADAMTRLADSGYDAFVEVSPHPVVVSAVAEAVEEAGVEDAVVVGTL 

SRGDGGPGAFLRSAATAHCAGVDVDWTPALPGAATIPLPTYPFQRKPYWLRSSAPAPA 

SHDLAYRVSWTPITPPGDGVLDGDWLVVHPGGSTGWVDGLAAAITAGGGRVVAHPVDS 

VTSRTGLAEALARRDGTFRGVLS WVATDERHVEAGAVALLTLAQALGDAG IDAPLWCL 

TQEAVRTPVDGDLARPAQAALHGFAQVARLELARRFGGVLDLPATVDAAGTRLVAAVL 

AGGGEDWAVRGDRLYGRRLVRATLPPPGGGFTPHGTVLVTGAAGPVGGRLARWLAER 

GATRLVLPGAHPGEELLTAI RAAGATAWCE PE AEALRTAIGGELPTALVHAETLTNF 

AGVADADPEDFAATVAAKTALPTVLAEVLGDHRLEREVYCSSVAGVWGGVGMAAYAAG 

SAYLDALVEHRRARGHASASVAWTPWALPGAVDDGRLRERGLRSLDVADALGTWERIiL 

RAGAVSVAVADVDWSVFTEGFAAIRPTPLFDELLDRRGDPDGAPVDRPGEPAGEWGRR 

IAALSPQEQRETLLTLVGETVAEVLGHETGTE1NTRRAFSELGLDSLGSMALRQRLAA 

RTGLRMPASLVFDHPTVTALARYLRRLWGDSDPTPVRVFGPTDEAEPVAWGIGCRF 

PGG I AT PEDLWR WSEGTS ITTG FPTDRGWDLRRLYHPDPDHPGTS YVDRGGFLDGAP 

D FDPGFFGITPRE ALAMDPQQRLTLE I AWE AVERAG I D PETLLGSDTGVFVGMNGQS Y 

LQLLTGEGDRLNGYQGLGNSASVLSGRVAYTFGWEGPALTVDTACSSSLVAIHLAMQS 

LRRGECSLALAGGVTVMADPYTFVDFSAQRGLAAIXjRCKAFSAQADGFAIiAEGVAALV 

LEPLSKARRNGHQVLAVLRGSAVNQDGASNGLAAPNGPSQERVIRQALTASGLRPADV 

DMVEAHGTGTELGDPIEAGALIAAYGRDRDRPLWIiGSVKTNIGHTQAAAGAAGVIKAV 

LAMRHGVLPRSLHADELSPHIDWADGKVEVLREARQWPPGERPRRAGVSSFGVSGTNA 

HVIVEEAPAEPDPEPVPAAPGGPLPFVLHGRSVQTVRSQARTLAEHLRTTGHRDLADT 

ARTLATGRARFDVRAAVIiGTDREGVCAALDAIjAQDRPSPDVVAPAVFAARTPVIiVFPG 

QGSQWVGMARDLliDSSEVFAESMGRCAEALSPYTDWDXiLDWRGVGDPDPYDRVDVIiQ 

PVLFAVMVSIJ^LWQSYGVTPGAWGHSQGEIAAAHVAGALSLADAARVVALRSRVLR 

ELDDQGGMVS VGTS RAELD S VLRRWDGRVAVAAVNGPGTL WAG PTAELDE FLAVAEA 

REMRPRRIAVRYASHSPEVARVEQRLAAELGTVTAVGGTVPLYSTATGDLLDTTAMDA 

GYWYRNLRQPVLFEHAVRSLLERGFETFIEVSPHPVLLMAVEETAEDAERPVTGVPTL 

RRDHDG PS EFLRKLLG AHVHGVD VDLRPAVAHGRLVDLPTYPFDRQRLW PKPHRRADT 

SSIiGVRDSTHPLLHAAVDVPGHGGAVFTGRLSPDEQQWIiTQHVVGGRNLVPGSVLVDL 

ALTAGADVGVPVLEEIiVLQQPIiVLTAAGALLRLSVGAADEDGRRPVEIHAAEDVSDPA 

EARWSAYATGTLAVGVAGGGRDGTQWPPPGATALTLTDHYDTLAELGYEYGPAFQALR 



CDS 



/genes"megAII w 
22867. .33555 
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AAWQHGD WYAE VS LDAVEEG YAFDPVLLD AVAQT FGLTS RAPGKLPF AWRGVTLHAT 
GATAVRWATPAGPDAVALRVTDPTGQLVATVDALVVRDAGADRDQPRGRDGDLHRLB 
WVRLATPDPTPAAWHVAADGLDDLLRAGGPAPQAVWRYRPDGDDPTABARHGVLWA 
ATLVRRWLDDDRWPATTLVVATS AGVEVS PGDDVPRPGAAAVWGVLRCAQAES PDRFV 
LVDGDPETPPAVPDNPQLAVRDGAVFVPRLTPLAGPVPAVADRAYRLVPGNGGSIEAV 
AFAPVPDADRPLAPEEVRVAVRATGVNFRDVLLALGMYPEPAEMGTEASGWTEVGSG 
VRRFTPGQAVTGLFQGAFGPVAVADHRLLTPVPDGWRAVDAAAVP I AFTTAHY ALHDL 
AGLQAGQS VLVHAAAGGVGMAAVALARRAGAE VFATAS PAKHPTLRALGLDDDH I AS S 
RESGFGERFAARTGGRGVDWLNSLTGDLLDESARLLADGGVFVEMGKTDLRPAEQFR 
GRYVPFDLAEAGPDRLGEILEEWGLLAAGALDRLPVSVWELSAAPAALTHMSRGRHV 
GKLVLTQPAPVHPDGTVLVIGGTGTLGRLVARHL\rrGHG\^HLLVASRRGPAAPGAAE 
LRADVEGLGATI EI VACDTADREALAALLDS I PADRPLTGWHTAGVLADGLVTS IDG 
TATDQVLRAKVDAAWHLHDLTRDADLS FFVLFSS AASVLAGPGQGVYAAANGVLNALA 
GQRRALGLPAKALGWGLWAQASEMTSGLGDRIARTGVAALPTERALALFDAALRSGGE 
VLFPLSVDRSALRRAEYVPEVLRGAVRSTPRAANRAETPGRGLLDRLVGAPETDQVAA 
liAELVRSHAAAVAGYDSADQLPERKAFKDLGFDSIiAAVELRNRLGVTTGVRLPSTIjVF 
DHPTPLAVAEHLRSELFADSAPDVGVGARLDDLERALDALPDAQGHADVGARLEALLR 
RWQSRRPPETEPVTISDDASDDELFSMLDRRLGGGGDV" (SEQ ID NO: 14) 

miscJEeature 22957. .24237 

/gene« w megAII" 
/functions «KS3 M 

misc_feature 24544.. 25581 

/gene= n megAH" 
/function= n AT3 n 

misc_f eature 26230 . .26733 

/genes "megAII" 
/function= M KR3 (inactive) M 

miscJEeature 26998.. 27258 

/gene="megAII" 
/function= M ACP3" 

misc_f eature 27393 . .28590 

/genes "megAII w 
/function="KS4 M 

misc_feature 28897. .29931 

/ ge ne = n meg AI I " 
/functions »AT4'» 

misc_f eature 29953.-30477 

/gene= ,l megAII" 
/ function* "DH4 " 

misc^f eature 31396 . . 32244 

/gene « "megAII " 
/function=«ER4" 

misc_f eature 32257 . . 32799 

/genes "megAII" 
/functiona M KR4 M 

misc_f eature 33052 . .33312 

/genes "megAII" 
/function« M ACP4 tt 

gene 33666.. 43271 

/gene = " megAI I I " 

CDS 33666.. 43271 

/genes " megAI I I " 

/notes "polyketide synthase" 

/codon_start=l 

/transl_table=ll 

/products "megalomicin 6-deoxyerythronolide B synthase 3" 

/translat ions " MSESSGOTEDRLRRYLKKTVAELDSVTGRLDEVEYRAREPIAVV 

GMAC RFPGGVDS PEAFWEFI RDGGDAI AEAPTDRGWPPAPRPRLGGLLAEPGAFD AAF. 

FGIS PREALATDPQQRLMLE I S WEALERAGFDPS SLRGSAGGVFTGVGAVDYGPRPDE 

APEEVLGYVGIGTASSVASGRVAYTLGLEGPAVTVI)TACSSGLTAV^^ 

TLVXAGGVTVTCSSPGAFTEFRSQGGLAEDGRCKPFSRAADGFGIA^ 

ARAEGRPVTAVXRGSAINQDGASNGLTAPSGPAQRRVIRQALERARLRPVDVDYVEAH 

GTGTRLGDPIEAHALLDTYGADREPGRPLWVGSV7CSNIGHTQAAAGVAGW 

HRE I PATLHFDEPS PHVDWDRGAVS WS ETRPWPVGERPRRAGVSS FGI SGTNAHVI V 
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EEAPSPQAADLDPTPGPATGATPGTDAAPTAEPGAEAVALVFSARDERALRAQAARLA 

DRliTDD PAP S LRDTAFTLVTRRATWEHRAVWGGGE EVLAGLRAVAGGRP VDGAVSGR 

ARAGRRWLVFPGQGAQXQGMARDLLRQS PTFAES IDACERALAPHVDWSLREVLDGE 

QSLDPVDVVQPVLFA^WSLARLWQSYGVTPGAWGHSQGEIAAAHVAGALSLADAAR 

WALRSRVLRRLGGHGGI4ASFGLHPDQAAERIARFAGALTVASVNGPRSVVLAGENGP 

LDELIAECEAEGVTARRIPVDYASHSPQVESLREELLAALAGVRPVSAGIPLYSTLTG 

QVI ETATMDADYWFANLREPVRFQDATRQLAEAGFDAFVEVS PHPVLTVGVEATLEAV 

LPPDADPCVTGTLRRERGGLAQFHTAIAEAYTRGVEVDWRTAVGEGRPVDLPVYPFQR 

QNFWLPVPLGRVPDTGDEWRYQIAWHPVDLGRSSLAGRVLVVTGAAVPPAWTDVVRDG 

LEQRGAT\A^CTAQSRARIGAALDAVDGTALSTWSLI^LAEGGAVDDPSLDTXALVQ 

ALG AAG ID VPLWLVTRDAAAVTVGDD VD PAQAMVGGLGR WG VES PAR WGGLVDLREA 

DADSARSLAAILADPRGEEQFAIRPDGVTVARLVPAPARAAGTRWTPRGTVLVTGGTG 

G IGAHIJVRWLAGAGAEHLiVLLNRRGAE AAGAADLRDELVALGTG VT I TACDVADRDRL 

AAVLDAARAQGRWTAVFHAAG I SRSTAVQELTE S E FTE I TDAKVRGTANLAELCPEIj 

DALVLFS SNAAVWGS PGLAS YAAGNAFLDAFARRGRRSGLPVTS I AWGLWAGQNMAGT 

EGGD YLR SQGLRAMDPQRAI EELRTTIjDAGDPWVS WDLDRERFVEL FTAARRRPLFD 

ELGGVRAGAEETGQESDLARRLASMPEAERHEHVARLVRAEVAAVLGHGTPTVIERDV 

AFRDIXSFDSMTAVDLP^LAAVTGVRVATTIVFDHPTVDRLTAHYLERLVGEPEATTP 

AAAWPQAPGEADEPIAIVGMACRIiAGGVRTPDQLWDFIVADGDAVTEMPSDRSWDLD 

ALFDPDPERHGTSYSRHGAFLDGAADFDAAFFGISPREALAMDPQQRQVLETTWELFE 

NAGIDPHSLRGTDTGVFLGAAYQGYGQNAQVPKESEGYLLTGGSSAVASGRIAYVLGL 

EGPAITVDTACSSSLVALHVAAGSI#RSGDCGLAVAGGVSVMAGPEVFTEFSRQGAIiAP 

DGRCKPFSDQADGFGFAEGVAWLLQRLSVAVREGRRVLGVWGSAVNQDGASNGLAA 

PSGVAQQRVIRRAWGRAGVSGGDVGWEAHGTGTRLGDPVELGALLGTYGVGRGGVGP 

VWGSVKANVGHVQAAAGWGVI KWLGLGRGLVG PMVCRGGLSGLVDWSSGGLWAD 

GVRGWPVGVDGVRRGGVSAFGVSGTNAHVWAEAPGSWGAERPVEGSSRGLVGVAGG 

WPVVLSAKTETALTELARRLHDAVDDTVALPAVAATLATGRAHLPYRAALLARDHDE 

LRDRLRAFTTGSAAPGWSGVASGGGWFVFPGQGGQWVGMARGLLSVPVFVESWEC 

DAWSSWGFSVLGVLEGRSGAPSLDRVDWQPVLFVV>fVSLARLWRWCGVVPAAVVG 

HSQGE I AAAWAG VLSVGDG ARWALRARALRALAGHGGMVS LAVS AERAREL I APWS 

DRISVAAVNSPTSVWSGDPQALAALVAHCAETGERAKTLPVDYASHSAHVEQIRDTI 

LTDLADVTARRPDVALYSTLHGARGAGTDMDARYWYDNLRSPVRFDEAVEAAVADGYR 

VFVEMSPHPVLTAAVQEIDDETVAIGSLHRDTGERHLVAELARAHVHGVPVDWRAILP 

ATHPVPLPNYPFEATRYWLAPTAADQVADHRYRVDWRPLATTPAEIiSGSYIiVFGDAPE 

TLGHSVEKAGGLLVPVAAPDRESLAVALDEAAGRLAGVLSFAADTATHLARHRLLGEA 

DVEAPLWLVTSGGVALDDHDPIDCDQAMVWGIGRVhlGLETPHRWGGLVDVTVEPTAED 

G WFAALLAADDHEDQ VALRDG I RHG RRLVRAPLTTRN ARWT P AGT AL VTGGTGALGG 

HVARYLARSGVTDLVLLSRSGPDAPGAAELAAELADLGAEPRVEACDVTDGPRI-RALV 

QELREQDRPVRIWHTAGVPDSRPLDRIDELESVSAAKVTGARLLDELCPDADTFVLF 

SSGAGVWGSANLGAYAAANAYLDALAHRRRQAGRAATSVAWGAWAGDGMATGDLDGLT v 

RRGLRAMAPDRAIiRACTRRWTTHDTCVSVADVDWDRFAVGFTAARPRPLIDELVTSAP 

VAAPTAAAAPVPAMTADQLLQFTRSHVAAILGHQDPDAVGLDQPFTELGFDSLTAVGZi 

RNQLQQATGRTLPAALVFQHPTVRRLADHLAQQLDVGTAPVEATGSVLRDGYRRAGQT 

GDVRSYI^LLANLSEFRERFTDAASLGGQLELVDLADGSGPVTVICCAGTAAIiSGPHE 

FARLASALRGTVPVRALAQPGYEAGEPVPASMEAVLGVQADAVLAAQGDTPFVIiVGHS . 

AGALMAYALATELADRGHPPRGVVLLDVYPPGHQEAVHAWLGELTAALFDHETVRMDD 

TRIiTALGAYDRLTGRWRPRDTGLPTLWAASEPMGEWPDDGWQSTWPFGHDRVTVPGD 

HFSMVQEHADAIARHIDAWLSGERA" (SEQ ID NO: 15) 



misc feature 



33780. .35027 



misc feature 



/genea w megAI II n 
/function= M KS5 n 
35385. .36419 



misc feature 



/gene= "megAIII ri 
/function* n AT5" 
37068. .37604 



misc feature 



/gene= n megAIII" 
/function="KR5» 
37860. .38120 



misc feature 



/gene= "megAI II " 
/function="ACP5 w 
38187.. 39470 



misc feature 



/gene= M megAI I I n 
/function^ "KS6" 
39795-. .40811 
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/gene="megAIII" 

/functiono"AT6" 
misc_feature 41406. .41936 

/gene « w meg AI I I n 

/function»"KR6 " 
misc_feature 42168. .42425 

/gene= H megAIII " 

/function="ACP6 " 
miscjieature 42585 . .43271 

/gene="megAIII" 

/function= M TE" 
gene 43268.. 44344 

/gene - " megCI I " 
CDS 43268.. 44344 

/gene= M megCII" 

/codon_start=l 

/transl_table=ll 

/product* M TDP-4-keto-6-deoxyglucose 3 , 4-isomerase" 

/ trans 1 at ion= "MNTTDRAVLGRRLQMIRGLYWGYGSNGDPYPMLLCGHDDDPHRW 

YRGLGGSGVRRSRTETWVVTDHATAVRVLDDPTFTRATGRTPEWMWUVGAPASTWAQP 

FRDVHAASWDAELPDPQEVEDRLTGLLPAPGTRLDLVRDLAWPMASRGVGADDPDVLR 

AAWDARVGLDAQLTPQPLAVTEAAIAAVPGDPHRRALFTAVEMTATAFVDAVLAVTAT 

AGAAQRLADDPDVAARLVAEVLRLHPTAHLERRTAGTETWGEHTVAAGDEVWWAA 

ANRDAGVFADPDRLDPDRADADRALSAQRGHPGRLEELVWLTTAALRSVAKALPGLT 

AGGPWRRRRSPVLRATAHCPVEL" (SEQ ID NO: 16) 

gene s> 44355. .45623 

/gene^'megCIII" 

CDS 44355.-45623 

/gene="megCHI" 

/codon_start«l 

/transl_tableall 

/product ="TDP-desos amine glycosyltransf erase" . 

/ 1 ran s 1 a t i on= " MRWFS S MAS KS HLFGLVPLAWAFRAAGHE VRWAS PALTDD IT 

AAGLTAVPVGTDVDLVDFMTHAGYDIIDYVRSLDFSERDPATSTWDHLLGMQTVLTPT 

FYALMSPDSLVEGMISFCRSWRPDWSSGPQTFAASIAATVTGVAHARLLWGPDITVRA 

RQKFLGLLPGQPAAHREDPLAEWLTWSVERFGGRVPQDVEELWGQWTIDPAPVGMRL 

DTGLRTVGMRYVDYNGPSWPDWLHDEPTRRRVCLTLGISSRENSIGQVSVDDLLGAL 

GDVDAEI I ATVDEQQLEGVAHVPANIRTVGFVPMHALLPTCAATVHHGGPGSWHTAAI 

HGVPQVILPDGWDTGVRAQRTEDQGAGIALPVPELTSDQLREAVRRVLDDPAFTAGAA 

RMRADMLAE P S PAEWDVCAGLVGERTAVG " (SEQ ID NO: 17) 

gene 45620.. 46591 

/gene» w megBH" 

CDS 45620.. 46591 

/gene="megBII M 
/codon_start=l 
/trans l_table=l 1 

/products "TDP-4-keto-6-deoxyglucose 2,3 dehydratase" 

/ trans 1 at ion= " MSTDATHVRLGRCALLTS RL WLGTAALAGQDD AD AVRLLDHARS 

RGVNCLDTADDDSASTSAQVAEESVGRWLAGDTGRREETVLSVTVGVPPGGQVGGGGL 

SARQIIASCEGSLRRLGVDHVDVLHLPRVDRVEPWDEVWQAVDALVAAGKVCYVGSSG 

FPGWHIVAAQEHAVRRHRLGLVSHQCRYDLTSRHPELEVLPAAQAYGLGVFARPTRLG 

GLLGGDGPGAAAARASGQPTALRSAVEAYEVFCRDLGEHPAEVALAWVLSRPGVAGAV 

VGARTPGRLDSALRACGVALGATELTAIiDGIFPGVAAAGAAPEAWLR" (SEQ ID NO: 18) 

gene complement (46660 . .47403) 

/gene="megH M 

CDS complement (46660. .47403) 

/gene="megH n 

/note« M putative thioesterase" 
/codon_start=l 
/transl table=ll 
/products "TEII" 

/translation= w MNTWLRRFGSADGHRARLYCFPHAGAAADSYLDIiARALAPEVDV 
WAVQYPGRQDRRDERALGTAGEIADEVAAVLRDLVGEVPFALFGHSMGALVAYETARR 
LEARPGVRPLRLFVSGQTAPRVHERRTDLPDEDGLVEQMRRLGVSE 
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LPVLRADHRVLRSYAWQAGPPLRAGITTLCGDTDPLTTVEDAQRWLPYSWPGRTRTF 

PGGHFYLADHVGEVAESVAPDLLRLTPTG " (SEQ ID NO: 19) 
gene complement (47411 . .>47981) 

/gene= M megF" 
CDS complement (47411. . >47980) 

/gene="megF" 

/codon_startoi 

/transl_table=ll 

/producto"C-6 hydroxylase" 

/translations " IRVQDDDADRLSRDELTSIALVLLLAGFEASVSIjIGIGTYLIiLT 
HPDQLALVRKDPALLPGAVEEILRYQAPPETTTRFATAEVEIGGVTIPAYSTVLIANG 
AANRDPGQFPDPDRFDVTRDSRGHLTFGHGIHYCMGRPIiAKLEGEVALGALFDRFPKL 
SLGFPSDEWWRRSLLLRGIDHLPVRPNG" (SEQ ID NO; 20) 

BASE COUNT 5962 a 16B75 c 18045 g 7099 t 

ORIGIN 

1 ctcgagccga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 
61 gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 
121 atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 
181 taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 
241 cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 
301 gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 
361 tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 
421 gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 
481 ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 
541 catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 
601 cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 
661 ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 
721 gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 
781 ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 
841 ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 
901 tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 
961 gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 
1021 atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 
1081 ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 
1141 gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 
1201 caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 
1261 accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 
1321 ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 
1381 gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 
1441 tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 
1501 accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 
1561 gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 
1621 gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 
1681 cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 
1741 gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 
1801 gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 
1861 cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 
1921 gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 
1981 ctgtcccgtt ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 
2041 cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 
2101 caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 
2161 ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 
2221 cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat- 
2281 cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 
2341 cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 
2401 ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 
2461 ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 
2521 gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 
2581 ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 
2641 gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 
2701 cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 
2761 gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 
2821 tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 
2881 catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccjttcga 
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2941 cgaccagcag ttggtcggcg tgggcagcgt 
3001 gccgatgaac gtcctgctgc ccacctgcgc 
3061 ttggctgacc gccgccatcc acggcgtacc 
3121 ggtgcacgcc aagcagctcc aggacctcgg 
3181 gaccgccgag cacctgcgtg gggcgatcga 
3241 cggtgcggag cggatgcggg acgggatgcg 
3301 catctgtcag gacctggccg ccgaccgggc 
3361 cgagccgcac ctgccgcgat gacttccacc 
3421 aatccacacg ccgactttcc ttctgacacg 
3481 tggacacgac agcacggccg gcactgaact 
3541 tcctggtctt cttcacgcac gtcctgtcga 
3601 acggcctgga cgccttctgg cagaccaccg 
3661 tcagcggttt cgtgctgacc tggtcggcgc 
3721 gcagacgggt ctgcaagctc ttccccaacc 
3781 tgttcctggt caccgggcag gcggtgagcg 
3841 tccacgcctg gttcccggcc ctggagatct 
3901 tggcctgcga ggcgttcttc tacctgtgct 
3961 tccgcccgga gcggctgtgg gcctgggccg 
4021 cggtggtcgc cgacctcctg ctgccgagtt 
4081 ccgccatcca ggactggttc ctctacacct 
4141 tcgggatcat cctggcccgc atcctgatca 
4201 ccgcggtgct gt'tgttcccg gtcttcttcg 
4261 ccatctcctc gtcgatgatg atccttcccc 
4321 ccgacctcca gcagaagcgc accttcatgc 
4381 tctccttcgc gctctacatg gtccacttcc 
4441 ggttcagcca gaccgaggac gccccgctgg 
4 501 tcgcggtctc cctggtgctg tcgtggctgc 
4561 gtaactgggc ccgcccggcc tccgcccggc 
4621 cttcccgccg gtaagaagga cggtgcatcg 
4681 gagtacgaga gggaacgagc cgacatcctc 
4741 agcctgatcc tcggtcagag tgtggagaac 
4801 atcgcgcact gcgtgggcgt cgacaacggc 
4861 gtaggtgtcg gacgcgacga cgaggtcgtc 
4921 ctggccatcg acgagatcgg cgcccggccg 
4981 ctcatggaca ccgacctggt ggaggcggcg 
5041 gtgcacctgt acgggcagtg cgtggacatg 
5101 ggcctcaagc tcgtggagga ctgcgcccag 
5161 gccgggacga tgagcgacgc ggcggccttc 
5221 tacggcgacg gcggcgcggt cgtcaccaac 
5281 ctgcggtact acgggatgga ggaggtctac 
5341 cgcctcgacg aggtgcaggc cgagatcctg 
5401 gtcgcgggtc ggcgggcggt cgcccagcgg 
5461 tcgcacggcc tcgaactccc agtggtcacc 
5521 gtcgtccgcc acccgcgccg cgacgagatc 
5S81 tccctgaaca tcagctaccc ctggccggtg 
.5641 gtcgcgtcgg ggtcgctgcc ggtcaccgaa 
5701 atgtacccct ccctccctca cgacctgcag 
5761 atcaccgggc tgtgacgagc ccgcgtgtcg 
5821 catgccgaac agccactcga ccacgtcgag 
5881 catctaccac gacttctacc acggccgtgg 
5941 cgtggaggtc gcccgcaagc acaccccaca 
6001 gaccggatcc cacctggtcg agctggcgga 
6061 gtcggccgcc atgctcgcca ccgccgcccg 
6121 cgacatgcgc gacttctccc tcgaccgcag 
6181 caccggttac ctcgtcgacg aggccgaact 
6241 cctcgcgcct ggcggcaccc tcgtcgtgga 
6301 cggctgggtc ggggccgacc tggtcaccag 
6361 caccgtcccg gcgggtctgc ccgaccgcac 
6421 ggggtcaccg gaggccggga tcgagcactt 
6481 ccgcgccgcc tacgagcagg ccttccagcg 
6541 cgacctgttc tcgccgggcc ttttcgtcgg 
6601 gaggagctgg gcatcgaggg ggtcttcacc 
6661 ggggtgttcg gcacggcgta ccaggaggac 
6721 ttcccggtgg cccaggtcag caccacccgg 



tccggcaaac gtccgtaccg ccgggttcgt 
ggccaccgtg caccacggcg gcaccggcag 
gcagatcatc ctctcggacg ccgacaccga 
cgcggggctg tcgctcccgg tcgcggggat 
gcgggttctc gacgagccgg cgtaccgcct 
gaccgacccg tcgccggccc aggtggtcgg 
ggcacgcggc aggcagccgc gtcgaaccgc 
accaccggga ccggctgatg ccggtcccgg 
agggggcccc ggtggttacc tccaccaact 
cgttgaccgg gatgcggttc gtcgccgcct 
ggctcatccc gaacagctac gtgtacgccg 
gacgggtggg ggtgtcgttc ttctttattc 
gggccagcga ctcggtgtgg tcgttctggc 
acctggtcac cgccttcgcc gccgtggtgt 
gtgaggcgct gatcccgaac ctcctgctga 
ccttcggcat caacccggtg agctggtcgt 
tcccgctgtt cctgttctgg atctccggta 
ccgtggtgtt cgccgcgatc tgggcggtac 
ccccgccgct gatcccgggg cttgagtact 
tccctgcgac gcggagcctg gagttcatcc 
ccggtcggtg gatcaacgtc gggctgctcc 
tcgcctcgct cttcctgccg ggtgtctacg 
tggttctgat catcgccagc ggcgcgacgg 
gtaaccgggt gatggtgtgg ctcggcgacg 
tggtgatcgt ctacggggcg gacctgctgg 
gtctcgcact cttcatgat'c attccgttcc 
tgtacaggtt cgtcgagcta cccgtcatgc 
gcaaacccgc cacggaaccc gaacagaccc 
gtgaccacct acgtctggtc ctatctgttg 
gatgcggtgc agaaggtctt cgccagtggc 
ttcgagaccg agtacgcccg ctaccacggg 
accaacgctg tgaaactcgc gctggagtcg 
acggtctcca acaccgccgc ccccacagtc 
gtcttcgtgg acgtccgcga cgaggactac 
gtcaccccgc gtaccaaggc catcgtcccg 
acagccctgc gggaactggc cgaccggcgg 
gcccacggtg cccggcggga cggtcggctg 
tcgttctacc cgacgaaggt cctcggcgcc 
gacgacgaga cagcccgcgc cc tgcgacgg 
tacgtcaccc ggaccccggg tcacaacagc 
cggcgcaaac tgacccggct cgacgcgtac 
tacgtcgacg ggctcgccga cctccaagac 
gacggcaacg aacacgtctt ctacgtgtac 
atcaagcgtc tccgggacgg gtacgacatc 
cacaccatga ccggcttcgc ccacctcggt 
cggctggccg gcgagatctt ctcccttccc 
gacagggtga tcgaggcggt gcgggaggtc 
tcagcgaaga cccactctgg aagggccggt 
caccgacgtc gccccgtacg agcgggcgga 
caagggatac cgtgccgaag ccgacgcgct 
ggcggcgacc ctgctggacg tggcctgcgg 
cagcttccgg gaggtggtgg gggtcgacct 
caacgacccc gggcgggaac tgcaccaggg 
gttcgacgtc gtcacctgca tgttcagctc 
ggaccgtgcc gtggcgaacc tggccggtca 
gccctggtgg ttcccggaga cgttccggcc 
cggtgaccgg aggatctccc ggatgtcgca 
cgcctcccgg • atgaccatcc actacacggt 
caccgaggtg cacgtgatga ccctgttcgc 
ggcgggcctg agctgctcgt acgtcggcca 
ggtcgccgcg gagccggggc ggtgagggtc 
ttcaccccgc agacgttcgc cgacgagcgg 
gtgttcgtgg cggcgctcgg ccgcccgctg 
tcccggcggg gtgtggtccg gggggtcjcac 
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6781 ttcacgacga tgcccggctc catggcgaag 
6841 gacttcgccg tcgacatccg gcccggttcc 
6901 ctctccgccg agtcgatggt cgggctgtac 
6961 tccctggagg acgacaccac cctcgtctac 
7021 gaacgggcgg tgcaccccct ggatccggag 
7081 ctcgtcatgt ccgagcggga ccgggtcgca 
7141 atcctgcccg actacgccgc ctgccgggcc 
7201 cggccgggcg tgcgggccgg tggtggtgct 
7261 cacccacgcc ctggccgacc tcccggtgcg 
7321 cgtgccctcc ggtgccgtcg ccgactacga 
7381 agcgctcgcg gaggtggtcg cggacgcccg 
7441 gggtacgtca gggtggcgga tcagcgagga 
7501 cctggtccgg gacctgatcg ccgtcctgtc 
7561 cccgggcagc aacacgcagg tcggcagggt 
7621 gcaggaccac cccgagggcg tctacgacag 
7681 ggaggccact gcggccgggg cgatccgggc 
7741 ggtgcccgcc gccggcaccg ccgacgaccg 
7801 cctgaccggc caaccgctga cgatgtggca 
7861 cgtgaccgac gccgcccggg ccttcgtcac 
7921 acgccacttc ctgttgggga cggggcgttc 
7981 ctcgcgcagc gtcgcccggc acaccggcga 
8041 tccggcgcac atggacccgt cggacctgcg 
8101 ggctgtcacc gggtggcggg ccacggtcac 
8161 ggcgttggcc ccccgccggg ccgccgcccc 
8221 gttcgtccta cggcaccggc ccgtcgacgg 
8281 ggagttcctc ctcgcccagc gtcagctcgg 
8341 gtgtgcgggg gccgatgaca gcgcccagga 
8401 cgacctcggc cgggtccgcg ccgaggcgtc 
8461 ggcgtacggc ggggaggagc acctgggcgc 
8521 ctgccaactt ctccagtacg ccgctgagca 
8581 cgcccacccc gtacgcctgg gcggcgggca 
8641 ggttgtacag gcactggtgg gagatcatgc 
8701 gggcggcggc gatgtgccag cccgccaggt 
8761 tgccgaccag atgttcggcg gcctgccaca 
8821 ggtgcgtctg gtagatgtcg atgtggtcga 
8881 cggcgacgat gtgtcgggcg gagagcccgc 
8941 ccaccttggt cgccaggacg gtctcctcgc 
9001 cgacgagttc ctcggtgtgg cccttgtaga 
9061 tgcagttgac gccccgctcg agggcgtggt 
9121 cccgtccact gaagttcacg gtgccgagcc 
9181 cgacgcgtac ccgggcggac ccggccccgg 
9241 gtgctggtgg gcgagcgcct ccagcacggg 
9301 cgcctcctgc cgcagcttct cggcgttctc 
9361 gagagcctgc cagagggtgt cggcgtcgac 
9421 cagctcggcg gtgcgctgac cacgcaggac 
9481 cggtacgccg tggtgcagcg cggtggccca 
9541 ggcacagccc ggcagcagga tgttcatggg 
9601 caccgacgcc ggatcgagcc cggagcgggt 
9661 ggtggccagt gtccggagga actcctgcgg 
9721 cccggtgaag cagacccggc ggactccgtc 
9781 ggacccgttg tagggcaaag tccgggtgtg 
9841 gctctcgggc agctggtcga cgctccactg 
9901 gccgaaccgg ccggcgacct cggtgagcca 
9961 gggacgctgc ccgcgcaggt cctgggagcg 
10021 ccacagcagc cgggcgtggg cggccccgca 
10081 gaagggctcc cagagcacca ggtcgggacg 
10141 gacgaaggag tcgttgttga ccaccgggaa 
10201 gtgcaggaac tcccacgagc gcagttccgg 
10261 gtagcggtgc acctgcgcgg cggcctcagg 
10321 gagtggcacc gaggtcagtc ccgcgccgac 
10381 cacccggacg tcgtggccgg cggtgtgcag 
10441 gtgggtacgg tgcgcgaacg aggtgagcag 
10501 gagggcggca acggtccggt cgatgccctc 
10561 cagcgtccgg aactcggtgg agtcgaagtc 



tacgtctact gcgccagggg tagggcgatg 
ccgaccttcg gccgggccga gccggtcgag 
cttcccgtgg gcatgggcca cctgttcgtc 
ctgatgtccg ccggttacgt ccccgacaag 
ctggcgttgc cgatcccggc cgacctcgac 
cccaccctcc gggaggcccg ggaccagggg 
gccgcgcacc gggtggtgcg gacgtgaccc 
cggcgcgtcg ggtttcctgg gttcggcggt 
ggtgcggctc gtcgcccggc gggaggtcgt 
gacgcaccgg gtggacctca ccgaacccgg 
ggcggtcttc ccgttcgccg cccagatcag 
cgacgtggtc gccgaacgga cgaacgtcgg 
ccgctcgccg cacgccccgg tggtggtctt 
caccgccggc cgggtcatcg acggcagcga 
gcagaaacac accggggaac agctgctcaa 
gaccagtctg cggctgcccc cggtgttcgg 
gggggtggtc tccaccatga tccgtcgggc 
cgacggcacc gtccggcgtg aactgctgta 
cgccctggac cacgccgacg cgctcgccgg 
ctggccgctg ggcgaggtct tccaggcggt 
ggacccggtg ccggtggtct cggtgccgcc 
cagcgtggag gtcgaccccg cccggttcac 
gatggcggag gcggtcgacc ggacggtggc 
gtccgagccc tcctgaccgg ggtcacccgg 
ccggtgccgg gaagatcgct tcgagttccc 
cggcccgtaa cgccgagtcg agctgctcgg 
tcccggggcg ggacaggacc caggccagac 
ggcagtagtc ctcgtacgcc tcgacgaggg 
gtccctgcgc cgacttgacg gcggttccgg 
gcccgccgtg caggggggac caggcgaaca 
ggacgtccag ctcggggtgg cggacggcca 
cgagcaggtt gcggcgtgcc gcgctctcct 
tggaggagcc gacgtacccg accttcccac 
cctcgtccca cggtgcggcg cggtcgatgt 
ccccgaggcg gcggagggag ttctcgcagg 
cgtcgttgac ccgttcgctc atctcgctgc 
gtcgacctcc gccctgggcg aaccaccgtc 
gccgccagcc gtagatgtcg gcggtgtcga 
ccatcagccg cagcgcgtcg tcgtcggtca 
agagtcggct ggtgtgcaac gccgatcgtc 
tggttcccac gtcggtcacc tgtcggcgcg 
tacgacctcg gcgggggtcg gcgcggccag 
ggcgtgggaa cggtcctcga ccactgtggc 
ctcgtccgga cggaggaaga cacccgctcc 
acagtcccac tcgtgggcga cggagatctg 
gcttccggca ccgccgtggt ggatgacggc 
aacgaagtcc accaggcgga cgttgtccgg 
caccacgatc tcgccgtcga accgcgcgag 
gttcgaggtg atgcccagcg ccgagtatcc 
cgaggtcctg agccactgcg gcacgacgga 
caccgactcc agtccggtct ccaggcggaa 
tccgacagcg aggtcctcgc tgtagtcgag 
gccgccgagc gggtccggcc ggtcgtcggc 
gctgcggaag tagccggtga ggtcgctgcc 
ggccttggcc gcgaccgccc cggcgaaggt 
ccagtccatg gcgaactcga cgagttcgtc 
gacgaaccgg gaggtggcct cctcgatgcc 
tccgcgtcgg gcgaagtcca ggtcggtggt 
ggagatgtcg aagagtcggt ggtccgagcc 
gacgacgtcg gtgagctcgg gctgactggc 
cgcccaggcc agggggacga ggccctggaa 
gacccgcact ggtcactcct tggtcgagat 
ggccagcggc acccgggggt gccagccggt 
gtcgctgcgg aagtcgttgg cctcggcgtt 
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1062X ctccggtgga gggacgctga cgacgggcac 
10681 ggcggcgacg gtctcgaaga tctcgccgag 
10741 gacgtcgccg accagcgcct cgtggttgtg 
10801 ctcgacgtgc aggaggttgc ggcgcacgct 
10861 ggcgagggct cgccggatca tggcggtgac 
10921 gctgtggccg tagatcgcgg gcaggcgcag 
10981 ggcctgacgc aggatccgct cggcctcgat 
11041 ggggttcgcg gcctgggtgg tgctggcgaa 
11101 ccgcagcgcg gcgacgaggt cgcgcatgat 
11161 cgtggcggcg ctgcgccagg tcgacccgcc 
11221 gtcggtgtcg gcgacgacct gcgcgacccg 
11281 ctcgatcccg gcgctgcctg gtggctggtc 
11341 tcggagaggg tgtgtggtaa attcgcgaag 
11401 gagaagtgtg acatgtcttg tcatctacta 
114 61 tccatttgtt ccccccaggg tggtgtcggg 
11521 cctctttcga gcgggtgctg aggcttcccg 
11581 tgtcggggaa agggcggatc gaggagttcg 
11641 gatccgggtc gacgccccga cgcgtgacag 
11701 ttttcggcga tggtcgcaga ttcctcccga 
11761 ggccgcaccg tcggtggcct cgtcgggggt 
11821 ccgtgccgac cagggtcggt ccgtcgccga 
11881 ccggcggcca ccgcccgatc gtgcccacct 
11941 tgatcgacac ttccggcgac gctatcaccg 
12001 tcgcgctttc caaacaggga aaacagcagc 
12061 agcgaagagt ctcgatgggg tcaaggtgaa 
12121 tttcttcagc caccctcgac gttcatacaa 
12181 gtggttgacg tgcccgatct actcggcacc 
12241 ccgtggcccc tgtgcggtca caacgaaccg 
12301 gcatatctcg aaggcatttc cgaggatgac 
12361 gagacacgcg cgcaggacgg gccgcaccgc 
12421 ctgaccgccg cgctcgccgc cctcgcccag 
12481 gtcgcccgac ccacggcacc ggtggtgttc 
12541 ggcatggcga cccgactgct cgccgagtcg 
12601 gagcgggcct tcgacgaggt caccgactgg 
12661 cacctgcgcc gcgtcgaggt ggtccagccc 
12721 gccctgtggc ggtcgttcgg ggtgcgaccc 
12781 ctggccgccg ccgaggtctg cggcgccgtc 
12841 ctgtggagcc gcgagatggt cccactggtg 
12901 tccccggccg agctggcagc ccgggtcgag 
12 961 gtcaacggtc cccggtcggt gctgctcacc 
13021 gccgagctgg cggcacaggg cgtacgcgcc 
13081 tcggcgcagg tcgacgccgt cgccgagggc 
13141 ggcgactccg acgtgcccta ctacgccggc 
13201 ctcggcgccg accactggcc gcgcagtttc 
13261 cgtgcggtcc tggaactgca gcccggcacg 
13321 gcggcctccc tgcagcagac cctcgacgag 
13381 ctgcaacgcg accagggcgg tctgcggcgg 
13441 ggtggcgtga cagtcgactg gaccgccgcc 
13501 tcggccgtcg ccgtcgagac cgacgaggga 
13561 gaccacgtac tgcgcgcgcg gctgctggag 
13621 gggcgggagg tcgacgcccg ggccaccttc 
13681 gtgcagctgc ggacccgcct cgccacggcg 
13741 tacgaccacc cgaccccgca cgccctcacc 
13801 ccggggcggg gtgaggagac ggcacacccg 
13861 gtggtcgcca tggcgtgccg gctgcccggc 
13921 ctgctggccg aggggcggga cgccgtcggc 
13981 gactcgctgt tccacccgga cccgacccgg 
14041 ttcctcaccg gcgccacctc cttcgacgct 
14101 ctggccgtcg agccgcagca gcggatcacg 
14161 gccgggatcc ccccgacgtc gttgcggacc 
14221 ccccaggagt acggcccccg gctggccgag 
14281 accgggacca ccaccagcgt cgcctccggt 
14341 ccggcgatca gcgtcgacac cgcctgctcg 
14401 cagtcgctgc ggcgcggcga gtcgacgatg 



cgcagggttg ccggtctgac gtgccacgct 
gggtcgggcc tcgtccgcgc tcggcgtcca 
cagtgcggcg gtgaacgcgg tggccacgtc 
gccctcgtgc cacatcgtga tcggctcacc 
gacaccccgg ccggtctgcc ccgacgggcc 
gatcaccccg tcgacgaccc cgtcctcggt 
cttgtgctgg gcgtaccggc tgggggcggc 
caggagcacc ggcgcgggtc cgggtcttgc. 
gcccgcgttg acgcgttcgg cctcgggcac 
ggcggcgtag gcgaccagat gcacgacgac 
gccgggttcg agcaggtcga ctcgaaggtg 
gcgagacccg gtgcgcgcga cggcccgcag 
aagggcgctt ccgacgaatc cagaaacgcc 
atgcattccg atagccaccg gcgcatggaa 
tgacaaatcc ggcctcaggt cggcctcaag 
cgtaccctcg gtggcctgcg ttcgggcggg . 
gtagggcgtc gcggcgcgta ctccgggact 
ggcgtcgatc cgtgccgccc gtaccgccgg 
cgtggtggac tcattggttc tcccgggtgt 
gtcggagacc gggtcgatcg ccgtccccgg 
ggtgggtcac cgtcgggtgg acccggtccg-- 
tcgcctccgc gggtaaatgc ttcgtcgatc 
gagcattccc cggcaccacc ggtcgatgcc 
tcacagcggt tccaggcgcc gggcaatcct 
ttctgtcaca gatgtttttg ttaaatgtac 
ttggccggca tctctaccaa gggggagtga £ 
cggactccgc acccagggcc gctcccattc 
gagctgcggg cccgcgcccg tcaattgcac 
gtggtggccg tcggcgccgc cctcgcgcgc 
gccgtcgtcg tggcctcctc ggtcaccgag- 
ggccgcccac acccctcggt ggtacgcggt 
gtcctgcccg gtcagggcgc ccagtggccc 
cccgtcttcg ccgcggcgat gcgggcctgc 
tcgttgaccg aggtcctgga ctcacccgag 
gcgctcttcg .cggtgcagac ctcactggcc 
gacgccgtac tcggacacag catcggtgag 
gacgtcgagg ccgccgcgcg ggccgccgcc 
ggccggggtg acatggcggc ggtggcgctc 
cggtgggacg acgacgtcgt gccggccggg 
ggcgctcccg agcccatcgc acggcgggtc 
caggtcgtca acgtgtcgat ggcggcgcac 
atgcgctcgg cgctgacctg gttcgccccc 
ctcaccggcg ggcggctgga cacccgggaa 
cggctcccgg tgcgcttcga cgaggcgacc 
ttcatcgagt cgagcccgca cccggtgctg 
gtcgggtccc cggccgcgat cgtgccgacc 
ttcctgctcg ccgtggcgca ggcgtacacc 
taccccgggg tgacccccgg ccacctgccg 
ccctcgacgg agttcgactg ggccgcgccc 
atcgtcggcg ccgagacggc cgcgctcgcc 
cgggaactgg gcctcgactc ggtcctcgcg 
accgggcggg atctgcacat cgccatgctc 
gaggcgctgc tgcgcggccc gcaggaggag 
acggaggccg aacccgacga acccgtcgcc 
ggcgtcacct caccggagga gttctgggag 
gggctgccca ccgaccgggg atgggacctg 
tcgggcacgg cgcaccagcg cgctggtggc 
gccttcttcg ggctgtcgcc acgggaggca 
ttggagctgt cgtgggaggt gctggaacgc 
tcccggaccg gggtgttcgt cggtctgatc 
gggggtgagg gcgtcgaggg ctacctgatg 
cgggtcgcct acaccctcgg cctggagggg 
tcgtcgctcg tcgccgtgca cctggcgtgc 
gcgctcgccg gtggcgtgac ggtgatgccg. 
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14461 acaccgggca tgctcgtgga cttcagtcgg 
14521 aaggcgttct cggccgccgc cgacgggttc 
14581 ctggaacggc tctcggacgc ccgccgccac 
14641 accgctgtca actccgacgg cgcgagcaac 
14701 gtccgggtga tccgacaggc cctcgccgag 
14761 gtggagaccc acggcaccgg cacccgcctc 
14821 gacgcgtacg gcggtgaccg tgagcacccg 
14881 gggcacaccc aggccgccgc cggtgtcgcc 
14941 gccggtgtcc tgccccgcac cctgcacgcc 
15001 tcgggcgcga tcagcctgct ccaggagccc 
15061 cgggccgggg tgtcctcgtt cggcatcagc 
15121 gcgccgccga ccggtgacga cacccgaccc 
15181 ctctcggcga gcaccggcga ggcgttgcgc 
15241 cgcgagcacc ccgaccagga cctggacgac 
15301 gcgctggcgt accgtagtgg gttcgtgccc 
15361 gacgaactcg ccgccggtgg atccggggac 
15421 cgcgtcgtct tcgtcttccc cggccaggga 
15481 ctcgacggcg acccggtctt cgcctcggtg 
15541 tacctggact tcgagatcgt cccgttcctg 
15601 cacacgctct ccaccgaccg cgtcgacgtg 
15661 tccctggcgg cccggtggcg ggcgtacggg 
15721 cagggggaga ttgccgcggc gtgtgtggcc 
15781 gcggtggccc tgcgcagccg ggtcatcgcc 
15841 atcgccgcct ccgtcgacga ggtggcggcc 
15901 gtcaacggtc cgcgcgcggt ggtggtctcc 
15961 gcctcctgca ccgtcgaggg ggtgcgggcc 
16021 tcctcgcacg tcgaggccgt ccgtgacgcg 
16081 ctgccgggct tcgtgccgtt ctactcgaca 
16141 ctcgacgccg ggtactggtt tcgcaacctg 
16201 cgctccctcg ccgaccaggg gtacacgacg 
16261 accacggcga tcgaggagat cggtgaggac 
16321 ctgcgacgtg gggccggcgg tcccgtcgac 
16381 gccggcgtcg cagtggactg ggagtcggcg 
16441 ctgcecacgt acccgttcca gcgtgagcgc 
16501 gtcgccgact ccgacgacgt ctcgtccctg 
16561 ccgggtgagc cgggacggct cgacggcacc 
16621 gacgaccggg tcgaggcggc gcggcaggcg 
16681 ctggtggtgg agccccggac gggccgggtc 
16741 ccggtggcgg gcgtgctctg cctgttcgct 
16801 ctggcggtga cgtcgttgtc ggacacgctc 
16861 cgggagtgtc cgatctgggt ggtcaccgag 
16921 ctccgcgacc cggcccacgg cgcgctctgg 
16981 cccgccgtct ggggcggcct ggtcgacgtg 
17041 cacctcggga cgaccctgtc cggcgccggc 
17101 acgtacgccc gccggtggtg cagggcgggc 
17161 ggcacggtgc tcgtcaccgg cggcaccggc 
17221 gcccggcagg gcaccccgtg cctggtgctg 
17281 gtcgaggagc tactcaccga actcgccgac 
17341 gacgtcaccg accgggagca gctccgtgcc 
17401 ctgtcggcgg tgttccacgt cgccgcgacg 
17461 ggtgaccgca tcgaacgggc caaccgggcg 
17521 ctgacccggg acgccgacct cgacgcgttc 
17581 ggcgcgccgg ggctcggcgg ctacgtcccg 
17641 cagcgacgca gcgagggact cccggccacc 
17701 gggatggccg agggtccggt cgccgaccgg 
17761 cccgaccagg ccgtcgaggg tctccgggtg 
17821 gtcgtcgaca tcaggtggga ccggttcctc 
17881 ctcttcgaca ccctcgacga ggcccgtcgg 
17941 gtggcggcgc tggccgggct gcccgtcggg 
18001 cggacgcacg cggctgccgt cctcggccac 
18061 gccttcgccg aactcggcgt cgactcgctg 
18121 actgcgaccg gggtccggct ggccacgacg 
18181 ctggccggac acctggccgc cgaactgggc 
18241 gaggccccga cggtggcccc gaccgacgag 



atgaactccc tcgcccccga cggacggtcc 
ggcatggccg aaggcgcagg gatgctcctg 
ggccacccgg tgctcgccgt gatcaggggc 
ggactctccg ccccgaacgg ccgggcccag 
tccgggctga cgccccacac cgtcgacgtc 
ggtgatccga tcgaggcacg ggcgctctcc 
ctgcggatcg gctcggtcaa gtccaacatc 
ggtctgatca aactggtgtt ggcgatgcag 
gacgagccgt caccggagat cgactggtcc 
gctgcctggc ccgccggcga gcggccccgc 
ggcaccaacg cacacgcgat catcgaggag 
gaccggatgg gcccggtggt gccctgggtg 
gcccgggcgg cgcggctggc cgggcaccta 
gtcgcctact cgctggccac cggtcgggcc 
gccgacgcgt ccacggcgct gcggatcctc 
gcggtgaccg gcaccgcccg cgccccgcag 
tggcagtggg cggggatggc agtcgacctg 
ctgcgggagt gcgccgacgc gttggaaccg 
cgggccgagg cgcagcgccg gacccccgac 
gtccagccgg tgctgttcgc ggtgatggtg 
gtggaaccgg cggccgtcat cggacactcc 
ggggcgctct cgctggacga cgcggcccgg 
accatgcccg gcaacggcgc gatggcctcg 
cggatcgacg ggcgggtcga gatcgccgcc 
ggcgaccgtg acgacctgga ccgcctggtc 
aagcggctgc cggtggacta cgcgtcgcac 
ctccacgccg aactcggcga gttccggccg 
gtcaccggcc gctgggtcga gcccgccgaa 
cgccacaggg tccggttcgc cgacgcggtc 
ttcctggagg tcagcgccca cccggtgctc 
cgtggcggtg acctcgtcgc tgtccactcg 
ttcggctccg cgctggcccg cgccttcgtg 
taccagggtg ccggggcgcg tcgggtgccg 
ttctggttgg aaccgaatcc ggcccgcagg 
cggtaccgca tcgaatggca cccgaccgat 
tggctgctgg cgacgtaccc cggtcgggcc 
ctggagtccg ccggggcgcg ggtcgaggac 
gacctggtgc ggcggctcga cgccgtgggt 
gtcgcggagc cggcggccga acactccccg 
gacctgaccc aggcggtggc cgggtcgggc 
aacgccgtcg ccgtcgggcc cttcgaacgg 
gccctcggtc gggtcgtcgc cctggagaac 
ccgtcgggtt cggtcgccga gctgtcgcgt 
gaggaccagg tcgccctccg acccgacggg 
gcgggcggca cgggccggtg gcagccccgg 
ggggtcggtc ggcacgtcgc ccggtggctg 
gccagccgcc ggggaccgga cgccgacggg 
ctgggcaccc gggccaccgt caccgcctgc 
ctcctcgcga ccgtcgacga cgagcacccg 
ctcgacgacg gcaccgtcga gaccctcacc 
aaggtgctcg gtgcccgcaa cctgcacgag 
gtgctcttct cctcctccac cgccgcgttc 
ggcaacgcct acctcgacgg tctcgcccag 
tcggtggcgt ggggtacctg ggcgggcagc 
ttccgccggc acggggtcat ggagatgcac 
gcactggtgc agggtgaggt agccccgatc 
ctcgcgtaca ccgcgcagcg ccccacccgg 
gccgcgcccg gtcccgacgc cgggccgggg 
gaacgcgaga aggcggtcct cgacctggta 
gcctcggccg agcaggtgcc cgtcgacagg 
tcggccctgg aactgcgcaa ccggctgacc 
acggtcttcg accacccgga cgtacggacc 
ggcggatcgg ggcgggagcg gcccgggggc 
ccgatcgcca tcgtcgggat ggcctgccgg - 
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18301 ctgccggggg gagtggactc accggagcag 
18361 accgcctcgg cggcacccgg ggaccggagc 
18421 acgacgggca cccgtaccgc cttcggcaac 
18481 gcgttcttcg ggatctcgcc gcgtgaggcg 
18541 ctggagacca cctgggaggc gctggagaac 
18601 acggacaccg gtgtcttcgt gggcatgtcc 
18661 cccgaggacg aggtcgacgg ctacctgttg 
18721 cggatcgcgt acgtgttggg gttggagggg 
18781 tcgtcgcttg tggcgttgca cgtggcggcg 
18841 gcggtggcgg gtggggtgtc ggtgatggcc 
18901 cagggcgcgt tggctccgga cggcaggtgc 
18961 ggtctggggg aggggtcggc cttcgtcgtg 
19021 gggcgtcggg tgttgggtgt ggtggtgggt 
19081 gggttggcgg cgccgtcggg ggtggcgcag 
19141 gcgggtgtgt cgggtgggga tgtgggtgtg 
19201 ggggatccgg tggagttggg ggcgttgttg 
19261 ggtccggtgg tggtgggttc ggtgaaggcg 
19321 gtggtgggtg tgatcaaggt ggtgttgggg 
19381 tgtcggggtg ggttgtcggg gttggtggat 
19441 ggggtgcggg ggtggccggt gggtgtggat 
19501-ggggtgtcgg ggacgaatg.c tcatgtggtg 
19561 gcggaacggc cggtggaggg gtcgtcgcgg 
19621 ccggtggtgc tgtcggcaaa gaccgaaacc 
19681 gaccacctgg agacgcaccc cgacgtcccg 
19741 gcccgccaac gcttcgacag gcgcgcggtc 
19801 ga'acggctgc gcggcctcgc cgggggcgaa 
19861 tcgggtggtg gtgtggtgtt tgtttttcct 
19921 cgggggttgt tgtcggttcc ggtgtttgtg 
19981 tcgtcggtgg tggggttttc ggtgttgggg 
20041 ttggatcggg tggatgtggt gcagccggtg 
20101 ttgtggcggt ggtgtggggt tgtgcctgcg 
20161 gcggcggcgg tggtggcggg ggtgttgtcg 
20221 cgggcgcggg cgttgcgggc gttggccggc 
20281 cgcgacgacg tacagaagct cctcgacagc 
20341 gcggtcaacg gccccgacgc ggtggtggtc 
20401 gtcgagcact gtgacgggat cggggtccgg 
20461 cactccgcac aggtcgagtc gctccgggag 
20521 ggccgcccgg cgacggtgcc gttctactcc 
20581 gaactggacg ccgactactg gtaccgcaac 
20641 gtcgaggcgc tggcagcgcg tgacctcacc 
20701 ctgtcgatgg cggtcgggga gacgcttgcc 
20761 ctggaacgcg acaccgacga cgtcgagcgc 
20821 cacggcgtac ccgtggactg ggcggcggtc 
20881 acctatccct tccagggacg gcggttctgg 
20941 gtcgccgact ggttccaccg ggtcgactgg 
21001 ctcgacggtc gctggctggt ggtcgtaccc 
21061 gaggtgcggg ccgccctcgc cgccggtggt 
21121 gtcaccgacc gggtcggtga cagcgacgcg 
21181 ggtgcggccg agaccctggc gctgctgcga 
21241 ctgtgggtgg tcaccgtggg ggccgtcgcc 
21301 gcgacggtgt gggggttggc ccttgtcgcc 
21361 ctgctggatc tgccgcagac accggacccg 
21421 gccggtgccg aggaccaggt agcggtccgc 
21481 cccaccccgg tcaccggagc cgggccgtac 
21541 gggggcaccg ccggtctggg tgccgtcacc 
21601 cacctcgccc tggtcagccg gcgcgggccg 
21661 gacctgaccg ggctcggcgt acgggtgtcg 
21721 tcggtcggcg ccctggtgca ggagttgaca 
21781. cacgctgccg gtctgcccca gcaggtgcca 
21841 gacgtggtgg ccgtgaaggt cgacggcgcg 
21901 gaactgttcc tgctgttctc ctccggggcc 
21961 tacgccgccg gaaacgcctt cctggacgcc 
22021 cccgccacct cggtggcgtg ggggctctgg 
22081 gcggtgtcgt tcctgcgtga gcggggcgta 
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ctgtgggagt tgatcgtctc cgggcgggac 
tgggatccgg cggagttgat ggtctccgac 
ttcatgcccg gggcgggcga gttcgacgcg 
ttggcgatgg atccgcagca gcggcacgcc 
gccggtatcc ggcccgagtc gttgcgcggt 
catcaggggt acgccaccgg ccgcccgaag 
acaggcaaca ccgcgagcgt cgcctccggt 
ccggcgatca ctgtggacac ggcgtgttcg 
ggttcgttgc gttctgggga ctgtggtctg 
ggtccggagg tgttcaggga gttctcccgg 
aagcccttct cggacgaggc cgacggcttc 
ttgcagcggt tgtcggtggc ggtgcgggag 
tcggcggtga atcaggatgg ggcgagtaat 
cagcgggtga ttcggcgggc gtggggtcgt 
gtggaggcgc atgggacggg gacgcggttg 
gggacgtatg gggtgggtcg gggtggggtg 
aatgtgggtc atgtgcaggc ggcggcgggt— 
ttgggtcggg ggttggtggg tccgatggtg 
tggtcgtcgg gtgggttggt ggtggcggat 
ggggtgcgtc ggggtggggt gtcggcgttt 
gtggcggagg cgccggggtc ggtggtgggg 
gggttggtgg gggtggttgg tggtgtggtg 
gccctgcacg cccaggcacg tcgactcgcc 
atgaccgacg tggtgtggac gctgacgcag 
ctcctcgccg ccgaccggac ccaggccgtg 
ccggggaccg gtgtggtgtc gggggtggcg .> 
ggtcagggtg gtcagtgggt ggggatggcg 
gagtcggtgg tggagtgtga tgcggtggtg , t 
gtgttggagg gtcggtcggg tgcgccgtcg 
ttgttcgtgg tgatggtgtc gttggcgcgg . 
gcggtggtgg gtcattcgca gggggagatc 
gtgggtgatg gtgcgcgggt ggtggcgttg 
cacggcggca tggcctcggt acgccgaggc 
ggcccctgga cggggaagct ggagatcgcc 
tccggcgacc cccgagccgt gaccgagctg 
gcccggacga tccccgtcga ctacgcctcc 
gagctgctct ccgtcctggc cgggatcgag 
accctcaccg gtgggttcgt cgacggcacc 
ctgcgccacc cggtgcggtt ccacgccgcc 
acgttcgtcg aggtcagccc gcaccccgtg 
gacgtggagt ccgccgtcac tgtgggcacc 
ttcctcacct ccctcgccga ggcgcacgtc 
ctcggctccg gaaccctggt cgacctgccc 
ctgcaccccg accgtggtcc gcgtgacgat 
acggcgacgg ccaccgacgg gtcggcccga 
gaggggtaca cggacgacgg ctgggtcgtg 
gccgagccgg tggtgacgac ggtcgaggag 
gtggtgtcga tgctcgggct ggccgacgac 
cgactcgacg cacaggcgtc caccacccca 
cccgccggtc cggtgcagcg ccccgaacag 
tccctggaac gcggacaccg gtggaccggc 
cagctacgac cccggctggt cgaggcgctc 
gccgacgccg tacacgcccg tcggatcgtc 
accgccccgg gcgggacgat cctcgtcacc 
gcccgatggc tcgccgagcg cggtgccgaa 
ggcaccgccg gcgtcgacga ggtggtccgg 
gtgcactcct gcgacgtcgg cgaccgcgag 
gcagccggtg acgtggtccg gggggtggtc 
ctgaccgaca tggacccggc cgacctcgcc 
gtgcacctgg ccgacctgtg cccggaggcc 
ggggtgtggg gcagtgcccg tcagggtgcg 
ttcgcccgac accggcggga ccggggtctg 
gcggccgggg ggatgacagg ggaccaggag 
cggccgatgt cggtgccgag ggcactggaa. 
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22141 gcgctggaac gggtcctcac cgccggggag accgcggtgg tcgtcgccga cgtcgactgg 
22201 gcggccttcg ccgagtcgta cacctccgcc cggccccggc cgctgctcca ccggctcgtc 
22261 acacctgcgg cggcggtcgg cgagcgcgac gagccgcgtg agcagaccct ccgggaccgg 
22321 ctggcggccc tgccccgggc cgagcggtcg gcggagctgg tacgcctggt ccggcgggac 
22381 gccgcagccg tgctcggcag cgacgcgaag gccgtacccg ccaccacgcc gttcaaggac 
22441 ctcgggttcg actcgctggc cgcggtccgg ttccgtaacc ggctggccgc ccacaccggt 
22501 ctgcgtctgc cggccaccct ggtcttcgag cacccgaacg ccgcagccgt cgccgacctc 
22561 ctccacgacc gactcggcga ggccggcgag ccgacccccg tccggtcggt gggcgccgga 
22621 ctggccgcgc tggagcaggc cctgcccgac gcctccgaca cggagcgggt cgagctggtc 
22681 gagcgcctgg aacggatgct cgccgggctc cgccccgagg ccggagccgg ggccgacgcc 
22741 ccgaccgccg gtgacgacct gggggaggcc ggcgtcgacg aactcctcga cgcgctcgaa 
22801 cgggaactcg acgccaggtg aacccgaact gaccgcagcc gcagccgaag cagagaccga 
22861 ggacctgtga ctgacaacga caaggtggcg gagtacctcc gtcgtgcgac gctcgacctg 
22921 cgggccgccc gcaagcgcct gcgcgagctg caatccgacc cgatcgcggt cgtcggcatg 
22981 gcctgccgcc taccgggcgg ggtgcacctc ccgcagcacc tgtgggacct cctgcgccag 
23041 gggcacgaga cggtgtccac cttccccacc gggcgcggct gggacctggc cgggctcttc 
23101 cacccggacc ccgaccaccc cggcaccagc tacgtcgacc ggggtgggtt ccfccgacgac 
23161 gtggcgggct tcgacgccga gttcttcggg atctccccgc gcgaggccac ggccatggac 
23221 ccgcaacagc ggctgctgtt ggagaccagt tgggagctgg tggagagcgc cggcatcgat 
23281 ccgcactccc tgcgtggcac cccgaccggc gtcttcctcg gcgtggcgcg gctcggctac 
■"-23341 ggcgagaacg gcaccgaagc cggtgacgcc gagggctatt cggtgaccgg ggtggcaccc 
23401 gctgtcgcct ccgggcggat ctcctacgcc ctcgggctgg agggtccgtc gatcagcgtg 
23461 gacaccgcgt gctcgtcgtc gttggtggcg ctgcacctgg cggtcgagtc gctgcggctg 
23521 ggcgagtcga gtctcgctgt cgtcggcggg gcggcggtca tggcgacacc aggggtgttc 
23581 gtcgacttca gccgccagcg ggcgttggcc gctgacggca ggtcgaaggc cttcggggcc 
23641 gccgccgacg ggttcggctt ctccgagggg gtctccctcg tcctgctcga acggctctcc 
23701 gaggccgaaa gcaacggcca cgaggtgttg gctgtcatcc gtggctccgc cctcaaccag 
23761 gacggggcca gcaacggtct cgccgcgccg aacgggaccg cccagcgcaa ggtgatccgg 
23821 caggcgctac gaaactgcgg cctgaccccg gccgacgtgg acgccgtgga ggcgcacggc 
-23881 accggcacca cgctcggcga cccgatcgag gccaacgccc tgctggacac ctacggccgt 
23941 gaccgggatc cggaccaccc gctgtggctg gggtcggtga agtcgaacat cggccacacg 
24001 caggcggcgg cgggcgtcac cgggctgctc aagatggtgc tggcactgcg ccacgaggaa 
24061 ctgcccgcca ccctgcacgt cgacgagccc accccgcacg tggactggtc ctcgggagcg 
24121 gtacgcctgg cgacccgggg ccggccgtgg cggcggggtg accggccgag gcgggccggg 
24181 gtgtcggcgt tcggcatcag cgggaccaac gcccacgtga tcgtcgagga ggcacccgag 
24241 cggaccaccg agcgcaccgt cggcggcgac gtcggcccgg tcccgctcgt ggtgtccgcc 
24301 cggtcggcgg cggcgctacg ggcccaggcg gcccaggtcg ccgagctggt ggagggctcc 
24361 gacgtcgggc tggcggaggt cgggcggagc ctggccgtga cccgggcgcg acacgagcac 
24421 cgggcggcgg tggtggcgtc gacccgggcc gaggcggtgc gggggctgcg cgaggtcgcg 
24481 gcggtcgaac cgcgcggcga ggacaccgtc accggggtcg ccgagacgtc cgggcgcacc 
24541 gtcgtcttcc tcttcccggg acaggggtcc cagtgggtcg ggatgggcgc ggagctgctg 
24601 gactcggcac cggcgttcgc cgacacgatc cgcgcctgcg acgaggcgat ggcaccgttg 
24661 caggactggt cggtctccga cgtgctccgg caggagccgg gggcaccggg actggaccgg 
24721 gtcgacgtgg tgcagccggt gctgttcgcg gtgatggtgt cgttggcgcg gttgtggcag 
24781 tcgtacgggg tcacccccgc tgcggtggtg gggcactcgc agggggagat cgccgccgcc 
24841 cacgtggcgg gtgcgctctc cctcgccgac gcggcgaggc tggtggtggg ccgcagccgg 
24901 ttgctgcggt cgctgtccgg gggcggcggc atgagcgccg tcgcgctcgg tgaggccgag 
24961 gtacgccgcc gactgcggtc gtgggaggac cggatctccg tggccgccgt caacggaccc 
25021 cggtcggtgg tggtggccgg ggaaccggag gcgctgcggg agtggggacg ggagcgggag 
25081 gccgagggcg tacgggtccg cgagatcgac gtcgactacg cctcgcactc gccgcagatc 
25141 gacagggtcc gtgacgaact cctgacggtc acgggggaga tcgagccccg gtcggcggag 
25201 atcaccttct actcgacggt cgacgtccgt gctgtcgacg gcaccgacct ggacgcgggg 
25261 tactggtacc gcaacctgcg ggagacggtc cggttcgccg acgcgatgac ccggttggcc 
25321 gactcgggat acgacgcgtt cgtcgaggtc agcccgcatc cggtggtggt gtcggcggtc 
25381 gccgaggcgg tcgaggaggc aggtgtcgag gacgccgtcg tcgtcggcac cctgtcccgg 
25441 ggcgacggcg gaccgggggc gttcctgcgg tcggcggcca ccgcccactg cgccggtgtg 
25501 gacgtcgact ggacgcccgc cctcccggga gctgcgacga tcccgttgcc gacgtacccg 
25561 ttccaacgga agccgtactg gctgcggtcg tctgctcccg cccccgcctc ccacgatctc 
25621 gcctaccggg tgtcctggac gccgatcacc ccgcccgggg acggcgtact cgacggcgac 
25681 tggctggtgg tgcaccccgg gggcagcacc ggatgggtcg acgggttggc ggcggcgatc 
25741 accgccggcg gtggccgggt cgtcgcccac ccggtggact ccgtgacctc ccggaccggc 
25801 ctggccgagg cgctcgcccg gcgggacggc acgttccggg gggtgctgtc gtgggtggcg 
25861 accgacgaac ggcacgtcga ggccggtgcg gtcgccctgc tgaccctggc gcaggcgttg 
25921 ggtgacgccg gaatcgacgc accactgtgg tgcctgaccc aggaggcggt ccgtaccccc- 
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25981 gtcgacggtg acctggcccg accggcgcag 
26041 cggctggagc tggcccgccg cttcggtggg 
26101 gccgggacgc gtctggtcgc ggcggtcctc 
26161 cgtggcgacc gtctctacgg ccgtcgcctg 
26221 gggttcaccc cgcacggcac cgtcctggtc 
26281 ctggcccggt ggctcgccga acggggtgcc 
26341 ggcgaggagt tgctgaccgc gatccgggcc 
26401 gaggcggagg cactgcgtac ggcgatcggc 
26461 gagacgttga cgaacttcgc cggcgtcgcc 
26521 gtcgcggcga agaccgcgct gccgacggtc 
26581 gaacgggagg tctactgctc gtcggtggcc 
26641 tacgccgccg gcagcgccta cctcgacgcc 
26701 gccagcgcct cggtggcctg gaccccgtgg 
26761 ctgcgcgagc gcggcctgcg cagcctcgac 
26821 ctgctccgcg ccggtgcg§t gtcggtggcc 
26881 gagggtttcg cggccatccg gccgaccccg 
26941 gaccccgacg gcgcgcccgt cgaccggccg 
27001 atcgcggcgc tgtccccgca ggaacagcgg 
27061 gtcgcggagg tgctgggaca cgagaccggc 
27121 gaactcggcc tcgactcgct gggctcgatg 
27181 ggcctgcgga tgccggcctc- gctggtcttc 
27241 tacctgcgtc gactggtcgt cggggactcc 
27301 accgacgagg ccgaacccgt cgccgtggtc 
27361 gccacccccg aggacctctg gcgggtggtg 
27421 cccaccgacc ggggctggga cctccggcgg 
27481 accagctacg tcgacagggg gggattcctc 
27541 ttcgggatca ccccccgcga ggcgctggcg 
27601 atcgcgtggg aggcggtgga acgggcgggc 
27661 accggcgtct tcgtcggcat gaacggccag 
27721 gaccggctca acggctacca ggggttgggc 
27781 gcctacacct tcgggtggga ggggccggcg 
27841 ctggtcgcca tccacctcgc catgcagtcg 
27901 gccggcgggg tgacggtcat ggccgacccg 
27961 gggotcgccg ccgacgggcg gtgcaaggcg 
28021 gccgagggcg tcgcggcgct cgtcctcgaa 
28081 caggtgctgg cggtgctgcg cggcagcgcc 
28141 gccgccccga acgggccgtc gcaggaacgg 
28201 ctgcgtcccg ccgacgtcga catggtggag 
28261 ccgatcgagg ccggggcgct catcgcggcg 
28321 ctgggctcgg tgaagacgaa catcggccac 
28381 atcaaggcgg tcctggcgat gcggcacggc 
28441 ttgtccccgc acatcgactg ggcggacggg 
28501 tggccccccg gtgagcgccc ccgccgcgcc 
28561 aacgcccacg tcatcgtcga ggaggcaccc 
28621 gccccgggcg ggcccctgcc cttcgtcctg 
28681 caggcgcgga ccctcgccga acacctgcgc 
28741 gcccgtaccc tggccaccgg tcgcgcccgt 
28801 gaccgggagg gtgtctgcgc cgccctcgac 
28861 gtcgtcgccc cggcggtctt cgccgcccgt 
28921 tcgcagtggg tcggcatggc ccgtgacctg 
28981 atgggccggt gcgccgaggc gctgtcgccg 
29041 cgtggggtcg gcgaccccga cccgtacgac 
29101 gcggtgatgg tgtcgctggc gcggttgtgg 
29161 gtgggtcact cgcaggggga gatcgccgcc 
29221 gacgccgcca gggtggtggc gttgcgcagc 
29281 ggcatggtgt cggtcggcac ctcccgcgcc 
29341 gggcgggtcg cggtggcggc ggtgaacgga 
29401 gccgaactgg acgagttcct cgcggtggcc 
294S1 gcggtgcgct acgcgtcgca ctccccggag 
29521 gaactcggca ccgtcaccgc cgtcggcggc 
29581 gacctcctcg acaccacagc catggacgcc 
29641 gtgctgttcg agcacgccgt ccgcagcctc 
29701 gtcagcccgc accctgtgct gctgatggcg 
29761 ccggtcaccg gcgtgccgac gctgcgccgc 



gccgccctgc acggtttcgc ccaggtcgcc 
gtgctcgacc tgcccgccac cgtcgacgct 
gccggcggcg gcgaggacgt cgtcgccgtc 
gtcagggcga ccctgccgcc gcccggcggg 
accggcgcgg ccggtccggt gggcggtcgg 
acccgactcg tcctgcccgg cgcacacccg 
gccggtgcca ccgccgtggt gtgcgaaccg 
ggggagttgc cgaccgcgct cgtacacgcc 
gacgccgacc ccgaggactt cgccgccacc 
ctggcggagg tgctcggcga ccaccgcctc 
ggggtctggg gtggggtcgg catggccgcg 
ctggtcgagc accgtcgcgc ccgggggcac 
gccctgcccg gcgcggtcga cgacggtcgg 
gtggccgacg ccctcgggac gtgggaacgt 
gtcgccgacg tcgactggtc ggtcttcaca 
ctcttcgacg aactcctcga ccggcgcggg 
ggggagccgg cgggcgagtg gggtcgacga 
gagacgttgc tgaccctcgt cggcgagacg 
accgagatca acacccgtcg ggccttcagc 
gccctgcgtc agcgcctggc ggcccgtacc 
gaccacccga cggtcaccgc gctcgcgcgg 
gacccgaccc cggtacgggt gttcggcccc 
ggcatcggct gccggttccc cggcggcatc 
tccgagggca cctccatcac caccggattc 
ctctaccacc ccgacccgga ccaccccggc 
gacggggccc cggacttcga ccccgggttc 
atggacccgc agcagcggct caccctggag 
atcgacccgg agaccctcct cggcagcgac 
tcctacctgc aactgctgac cggggagggt 
aactcggcga gcgtgctctc cggccgtgtc 
ctgacggtgg acaccgcctg ctcgtcctcg 
ctgcgtcggg gtgagtgctc gctggcgttg 
tacaccttcg tggacttcag cgcacagcgg 
ttctccgcgc aggccgacgg gttcgccctc 
ccgttgtcca aggcgcggcg aaacggccac 
gtcaaccagg acggggccag caacggcctc 
gtgatcaggc aggccctgac cgcctccggg 
gcgcacggga cgggcaccga actcggcgac 
tacggccggg accgggaccg gccgctctgg 
acccaggccg ccgccggtgc cgccggggtg 
gtactcccga ggtcgctgca cgccgacgag 
aaggtcgagg tgctccgcga ggcacgacag 
ggggtgtcct ccttcggcgt cagcgggacc 
gccgaaccgg accccgaacc ggttcccgcc 
cacggacgca gcgtccagac ggtccggtcc 
accaccggcc accgggacct cgccgacacc 
ttcgacgtcc gggccgcagt gctcggcacc 
gcgctggcgc aggatcgccc ctcgcccgac 
acccccgtcc tggtcttccc cgggcagggg 
ctcgactcct ccgaggtgtt cgccgagtcg 
tacaccgact gggacctgct cgacgtggtc 
cgggtggacg tgctccagcc ggtgctgttc 
cagtcgtacg gggtgactcc gggtgcggtg 
gcgcacgtgg ctggtgcgtt gtcgttggcc 
cgggtgctgc gggagctcga cgaccagggc 
gagttggact cggtcctgcg ccggtgggac 
cccggcacgc tcgtggtggc cggacccacc 
gaggcccgcg agatgaggcc gcgtcggatc 
gtggcccggg tcgaacagcg gctcgccgcc 
acggtcccgc tctactccac cgccaccggg 
gggtactggt accgcaacct gcgccaaccg 
ctggagcggg gattcgagac gttcatcgag 
gtcgaggaga ccgccgagga cgccgagcgc 
gaccacgacg ggccgtcgga gttcctccgc- 
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29821 aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc tgcgtccggc ggtcgcccac 
29881 ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc agcggctctg gcccaagccg 
2 9941 caccgcaggg ccgacacctc gtcgctgggg gtccgtgact cgacccaccc gctgctgcac 
30001 gccgcagtcg acgtacccgg tcacggcgga gcggtgttca ccgggcggct ctcccccgac 
30061 gagcagcagt ggctgaccca gcacgtggtg. ggtgggcgga acctggtgcc cggcagtgtc 
30121 ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg tgccggtgct ggaggaactc 
30181 gtcctgcagc agccgctggt gttgaccgcc gccggtgcgt tgctgcgcct gtcggtcggc 
30241 gccgccgacg aggacgggcg gcggccggtc gagatccacg ccgccgagga cgtctccgac 
30301 ccggccgagg cccggtggtc ggcgtacgcg accgggaccc tcgccgtcgg cgtggccggc 
30361 ggcggccggg acggcacaca gtggcccccg cccggcgcca ccgccctgac gttgaccgac 
30421 cactacgaca ccctcgccga actgggctac gagtacgggc cggcgttcca ggcgctgcgc 
30481 gccgcgtggc agcacggcga cgtggtctac gcggaggtgt ccctcgacgc cgtcgaggag 
30541 gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc agaccttcgg cctgaccagt 
30601 cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca ccctgcacgc caccggggcc 
30661 actgcggtac gggtggtggc gacccccgcc ggaccggacg cggtggccct gcgggtcacc 
30721 gacccgaccg gtcagctcgt cgccacggtg gacgccctgg tcgtcaggga cgccggggcg 
30781 gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc gcctggagtg ggtacggctg 
30841 gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg cggccgacgg gctcgacgac 
30901 ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg tccgctaccg tcccgacggc 
30961 gacgacccga cggccgaggc ccgtcacggg gtgctctggg cggccacgct cgtgcgccgt 
31021 tggctcgacg acgaccggtg gcccgccacc accctggtgg tggccacgtc cgcaggggtc 
31081 gaggtctccc ccggggacga cgtgccgcgc cccggggccg ccgccgtgtg gggggtgctg 
31141 cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg tcgacggcga cccggagacg 
31201 cccccggcgg tgccggacaa tccgcagctc gcggtccgtg acggtgcggt gttcgtgcca 
31261 cggctgacgc cgctcgccgg tcccgtgccg gccgtcgccg accgggcgta ccggctggtg 
31321 cccggcaacg gcggctccat cgaggcagtg gccttcgccc ccgtccccga cgccgaccgg 
31381 cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca ccggcgtgaa cttccgtgac 
31441 gtcctgctcg cgctcggcat gtacccggaa ccggccgaga tgggcaccga ggcgtccggt 
31501 gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc ccggccaggc ggtgacgggc 
31561 ctgttccagg gggccttcgg gccggtggcg gtcgccgacc accggctcct caccccggtc 
31621 cccgacgggt ggcgggcggt ggacgccgca gccgtaccca tcgcgttcac caccgcccac 
31681 tacgcgctgc acgacctggc cgggttgcag gccgggcagt ccgtgccggt ccacgccgcc 
31741 gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc gggccggggc ggaggtgttc 
31801 gccacggcca gcccggccaa acacccgacg ctgcgggcgc tcggcctcga cgacgaccac 
31861 atcgcctcgt cccgggagag cgggttcggt gagcggttcg ccgcgcgtac cggggggcgg 
31921 ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc tcgacgagtc cgcgcggctg 
31981 ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg acctgcggcc ggcggagcag 
32041 ttccggggcc ggtacgtccc gttcgacctg gccgaggccg gtcccgatcg gctcggcgag 
32101 atcctggagg aggtcgtcgg tctgctggcc gccggtgccc tcgaccggtt gccggtgtcg 
32161 gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca tgagccgggg ccgacacgtg 
32221 ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg acggaacggt gctggtcacc 
32281 ggcgggaccg gcaccctggg gcggctggtc gcccgccacc tggtgaccgg gcacggcgta 
32341 ccccacctcc tggtggccag ccggcgcggt ccggcggccc cgggcgcggc cgagctgcgc 
32401 gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg cctgcgacac cgccgaccgg 
32461 gaggcgctcg cggcgctgct cgactcgatc cccgcggacc gtccgctgac cggggtggtg 
32521 cacaccgccg gggtcctggc cgacgggctg gtcacctcca tcgacgggac cgccaccgat 
32581 caggtcctgc gggccaaggt cgacgcggcg tggcacctgc acgacctgac ccgggacgcg 
32641 gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg tgctggccgg tcccgggcag 
32701 ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg ccgggcaacg gcgggccctc 
32761 ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc aggccagcga gatgaccagc 
32821 ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc tgccgaccga gcgggcgctg 
32881 gccctgttcg acgcggctct gcgcagcggc ggggaggtgc tgttcccgct gtctgtcgac 
32941 aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc tgcgcggcgc ggtccggtcc 
33001 acgccacggg ccgccaacag ggccgagacc ccgggccggg gcctgctcga ccgtctcgtc 
33061 ggtgcacccg agaccgatca ggtggccgcg ctggccgagc tggtccgctc gcacgcggcg 
33121 gcggtcgccg gctacgactc ggccgaccag ctgcccgaac gcaaggcgtt caaggacctc 
33181 gggttcgact cgctggcggc ggtggagctg cgcaaccggc tcggcgtcac caccggcgta 
33241 cggctgccca gcacgctggt gttcgaccac ccgacaccgc tggcggtggc cgaacacctg 
33301 cggtcggagt tgttcgccga ctccgcgccg gacgtcgggg tcggtgcgcg cctcgacgac 
33361 ctggaacggg cgctcgacgc cctgcccgac gcgcagggac acgccgacgt cggggcccgc 
33421 ctggaggcgc tgctgcgccg gtggcagagc cgacgacccc cggagaccga gccagtgacg 
33481 atcagtgacg acgccagtga cgacgagctg ttctcgatgc tcgacaggcg tctcggcggg 
33541 ggaggggacg tctaggtgac aggtcgattc cgccccgcgg cagtggaccg taccgccctg 
33601 acaggtccac cgggttcgcg tcgcctccca cacccgacgg ccggggtatc cacggaaggg 
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33661 atccgatgag cgagagcagc ggcatgaccg 
33721 ccgtcgccga actcgactcg gtgacaggtc 
33781 aaccgatcgc cgtcgtcggc atggcctgcc 
33841 cgttctggga gttcatccgc gacggtggtg 
33901 gctggccgcc ggcaccgcga ccccgcctcg 
33961 acgccgcctt cttcggcatc tcaccccgcg 
34021 tgatgctgga gatctcctgg gaggcgttgg 
34081 gcggcagcgc cggtggcgtc ttcaccggtg 
34141 acgaggcacc cgaggaggtg ctcggctacg 
34201 ccggacgggt ggcgtacacc ctggggttgg 
34261 gctcctccgg gctcaccgcg gtgcacctgg 
34321 ccctggtcct cgccggtggg gtcaccgtga 
34381 gcagccaggg cgggttggcc gaggacggcc 
34441 gcttcgggct cgccgagggg gccggggtcc 
34501 ccgagggccg gccggtgctg gccgtactgc 
34561 gcaacgggct caccgcgccg agcggccccg 
34621 agcgggcgcg gctgcgtccc gtcgacgtgg 
34681 ggctgggcga tccgatcgag gcgcacgccc 
34741 ccggccgccc gctctgggtc ggatcggtga 
34801 cgggggtggc cggggtgatg aagaccgtgc 
34 861 cgttgcactt cgacgagccc tcgccgcacg 
34921 tgtccgagac ccggccctgg ccggtggggg 
34981 tcggcatcag cggcaccaac gcgcacgtca 
35041 ccgacctcga cccgaccccc ggcccggcaa 
35101 ccaccgccga gccgggtgcg gaggcggtcg 
35161 ccctgcgcgc ccaggcggcc cggctcgccg 
35221 tgcgcgacac cgccttcacc ctggtcaccc 
35281 tcgtcggcgg gggcgaggag gtcctcgccg 
35341 tcgacggagc cgtcagcggg cgggcgcgcg 
35401 ggcagggcgc acagtggcag ggcatggccc 
35461 cggagtccat cgacgcctgc gagcgggcgc 
35521 aggtgctcga cggcgagcag tcgttggacc 
35581 cggtgatggt gtcgttggcg cggttgtggc 
35641 tgggtcactc gcagggggag atcgccgccg 
35701 acgccgccag ggtggtggcg ttgcgcagcc 
35761 ggatggcgtc gttcgggctc caccccgacc 
35821 gtgcgctgac tgtcgcctcg gtcaacggtc 
35881 gcccgttgga cgagctgatc gccgagtgcg 
35941 ccgtcgacta cgcctcacac tccccgcagg 
36001 cactggccgg ggtccgtccg gtgtcggccg 
36061 aggtcatcga • aacggcgacg atggacgccg 
36121 tgcgcttcca ggacgccacc aggcagctcg 
36181 tcagcccgca 'cccggtgttg acagtcggtg 
36241 ccgacgccga tccgtgtgtc acaggcaccc 
36301 tccacaccgc gctcgccgag gcgtacaccc 
36361 tgggtgaggg acgcccggtc gacctgccgg 
36421 tcccggtccc cctgggccgg gtccccgaca 
36481 ggcaccccgt cgacctcggg cggtcctccc 
36541 cggcagtacc cccggcctgg acggacgtgg 
36601 ccgtcgtgtt gtgcaccgcg cagtcgcgcg 
36661 acggcaccgc cctgtccact gtggtctctc 
36721 acgaccccag cctggacacc ctcgcgttgg 
36781 tccccctgtg gctggtgacc agggacgccg 
36841 cggcccaggc catggtcggt gggctcggcc 
36901 ggggtggcct ggtggacctg cgcgaggccg 
36961 tactggccga cccgcgcggc gaggagcagt 
37021 cccgtctcgt cccggcaccg gcccgcgcgg 
37081 tcctggtcac cggcggcacc ggcggcatcg 
37141 cgggcgccga gcacctggtg ctgctcaaca 
37201 acctgcgtga cgaactggtc gcgctcggca 
37261 ccgaccgcga ccggttggcg gccgtcctcg 
37321 cggcggtgtt ccacgccgcc gggatctccc 
37381 gcgagttcac cgagatcacc gacgcgaagg 
37441 gtcccgagct ggacgccctc gtgctgttct 



aggaccgcct ccggcgctat ctcaagcgca 
ggctcgacga ggtcgagtac cgggcccgcg 
ggttccccgg gggtgtggac tcgccggagg 
acgcgatcgc cgaggcgccc acggaccgtg 
gtggtctcct cgcggagccg ggcgcgttcg 
aggcgctcgc gacggacccc cagcagcgcc 
agcgtgcggg tttcgacccg tcgagcctgc 
tcggtgcggt ggactacgga cccaggccgg 
tcggcatcgg caccgcctcc agcgtcgcct 
agggtccagc cgtcaccgtc gacaccgcct 
cgatggagtc gctgcgccgc gacgagtgca 
tgagcagccc gggtgcgttc accgagttcc 
gctgcaaacc gttctcccgc gccgccgacg 
tggtgctcca acggctgtcc gtcgcccggg 
gtggctcggc gatcaaccag gacggtgcca 
cccagcggcg ggtgatcagg caggcgttgg 
actacgtgga ggcccacggc accggcaccc 
tgctcgacac gtacggtgcc gaccgggaac 
agtccaacat cggtcacacc caggcggcgg 
tggcgctgcg gcatcgggag atcccggcga 
tcgactggga ccggggtgcg gtgtcggtgg 
agcgcccgcg ccgggcgggg gtgtcctcgt 
tcgtcgagga ggcgccgagc ccgcaggcgg 
ccggagcgac ccccggaacg gatgccgccc 
cactggtgtt ctccgcgcgc gacgagcggg 
accgtctcac cgacgacccg gccccctcgt 
gccgtgccac ctgggagcat cgggcggtcg 
gcctccgggc cgtcgccggg ggacgtcccg 
ccggccgccg ggtggtgctg gtcttccccg 
gggacctgct gcggcagtcg ccgaccttcg 
tcgccccgca cgtggactgg tcgctgcgcg 
ccgtcgacgt ggtgcagccg gtgctgttcg 
agtcgtacgg ggtgactccg ggtgcggtgg 
cgcacgtggc tggtgcgttg tcgttggccg 
gggtgctgcg ccgtctcggt ggtcacggcg 
aggccgccga gcggatcgcg cgcttcgcgg 
cccgttcggt ggtgctggcc ggggagaacg 
aggccgaggg cgtgaccgcc cgtcggatcc 
tggagtcgct gcgtgaggag ctgctcgccg 
ggatccccct gtactcgacc ctgaccggtc 
actactggtt cgccaacctc cgggagccgg 
ccgaggcggg gttcgacgcc ttcgtcgagg 
tcgaggccac cctcgaggca gtgctgcccc 
tgcgccgcga acgcggcggt ctcgcgcagt 
ggggggtgga ggtcgactgg cgtaccgcag 
tctacccgtt ccaacgacag aacttctggc 
ccggcgacga gtggcgttac cagctcgcct 
tggccggacg ggtcctggtg gtgaccggag 
tccgcgacgg cctggaacag cgcggggcga 
cccggatcgg cgccgcactc gacgccgtcg 
tgctcgcgct cgccgagggc ggtgctgtcg 
tccaggcgct cggcgcagcc gggatcgacg 
ccgccgtgac cgtcggagac gacgtcgatc 
gggtggtggg cgtggagtcc cccgcccggt 
acgccgactc ggcccggtcg ctggccgcca 
tcgcgatccg gcccgacggc gtcaccgtcg 
cgggtacccg gtggacgccg cgcgggaccg 
gcgcgcacct ggcccgctgg ctcgccggtg 
ggcggggagc ggaggcggcc ggtgccgccg 
cgggagtcac catcacggcc tgcgacgtcg 
acgccgcacg ggcgcaggga cgggtggtca 
ggtccacagc ggtacaggag ctgaccgaga 
tgcggggtac ggcgaacctg gccgaactct 
cctcgaacgc ggcggtgtgg ggcagcacgg 



24/30 




WO 01/27284 



37501 ggctggcctc ctacgcggcg ggcaacgcct 
37561 gcagtgggct gccggtcacc tcgatcgcct 
37621 gtaccgaggg cggcgactac ctgcgcagcc 
37681 cgatcgagga gctgcggacc accctggacg 
37741 tggaccggga gcggttcgtc gaactgttca 
37801 aactcggtgg ggtccgcgcc ggggccgagg 
37861 ggctggcgtc gatgccggag gccgaacgtc 
37921 aggtggcagc ggtgctgggc cacggcacgc 
37981 gtgacctggg attcgactcc atgaccgccg 
38041 ccggggtccg ggtggccacg accatcgtct 
38101 cgcactacct ggaacgactc gtcggtgagc 
38161 tcccgcaggc acccggggag gccgacgagc 
38221 tcgccggtgg agtgcgtacc cccgaccagt 
38281 cggtcaccga gatgccgtcg gaccggtcct 
38341 ccgagcggca cggcaccagc tactcccggc 
38401 tcgacgcggc gttcttcggg atctcgccgc 
38461 ggcaggtcct ggagacgacg tgggagctgt 
38521 tgcgcggtac ggacaccggt gtcttcctcg 
38581 cgcaggtgcc gaaggagagt gagggttacc 
38641 ccggtcggat cgcgtacgtg ttggggttgg 
38701 gttcgtcgtc gcttgtggcg ttgcacgtgg 
38761 ggctcgcggt ggcgggtggg gtgtcggtga 
38821 ccaggcaggg cgcgctggcc cccgacggtc 
38881 ggttcggatt cgccgagggc gtcgctgtgg 
38941 gggaggggcg tcgggtgttg ggtgtggtgg 
39001 gtaatgggtt ggcggcgccg tcgggggtgg 
39061 gtcgtgcggg tgtgtcgggt ggggatgtgg 
39121 ggttggggga tccggtggag ttgggggcgt 
39181 gggtgggtcc ggtggtggtg ggttcggtga 
39241 cgggtgtggt gggtgtgatc aaggtggtgt 
39301 tggtgtgtcg gggtgggttg tcggggttgg 
39361 cggatggggt gcgggggtgg ccggtgggtg 
39421 cgtttggggt gtcggggacg aatgctcatg 
39481 tgggggcgga acggccggtg gaggggtcgt 
39541 tggtgccggt ggtgctgtcg gcaaagaccg 
39601 tgcacgacgc cgtcgacgac accgtcgccc 
39661 gacgcgccca cctgccctac cgggccgccc 
39721 acaggctgcg ggcgttcacc actggttcgg 
39781 cgggtggtgg tgtggtgttt gtttttcctg 
39841 gggggttgtt gtcggttccg gtgtttgtgg 
39901 cgtcggtggt ggggttttcg gtgttggggg 
39961 tggatcgggt ggatgtggtg cagccggtgt 
40021 tgtggcggtg gtgtggggtt gtgcctgcgg 
40081 cggcggcggt ggtggcgggg gtgttgtcgg 
40141 gggcgcgggc gttgcgggcg ttggccggcc 
40201 ccgaacgcgc ccgggagctg atcgcaccct 
40261 actccccgac ctcggtggtg gtctcgggtg 
40321 actgcgccga gaccggtgag cgggccaaga 
40381 cccacgtcga acagatccgc gacacgatcc 
40441 gacccgacgt cgccctctac tccacgctgc 
40501 acgcccggta ctggtacgac aacctgcgct 
40561 ccgccgtcgc cgacggctac cgggtcttcg 
40621 ccgcggtgca ggagatcgac gacgagacgg 
40681 gcgagcggca cctggtcgcc gaactcgccc 
40741 ggcgggcgat cctccccgcc acccacccgg 
40801 cccggtactg gctcgccccg acggcggccg 
40861 actggcggcc cctggccacc accccggcgg 
40921 acgccccgga gaccctcggc cacagcgtcg 
40981 ccgctcccga ccgggagtcc ctcgcggtcg 
41041 gtgtgctctc cttcgccgcc gacaccgcca 
41101 aggccgacgt cgaggcccca ctctggctgg 
41161 acgacccgat cgactgcgac caggcaatgg 
41221 agaccccgca ccggtggggc ggcctggtgg 
41281 gggtggtctt cgccgccctc ctggccgccg 
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tcctcgacgc cttcgcccgt cgtggtcggc 
ggggtctgtg ggccgggcag aacatggccg 
agggcctgcg cgccatggac ccgcagcggg 
ccggggaccc gtgggtgtcg gtggtggacc 
ccgccgcccg ccgccggccc ctcttcgacg 
agaccggtca ggaatcggat ctcgcccggc 
acgagcatgt cgcccggctg gtccgagccg 
cgacggtgat cgagcgtgac gtcgccttcc 
tcgacctgcg gaaccggctc gcggcggtga 
tcgaccaccc gacagtggac cgcctcaccg 
cggaggcgac gaccccggct gcggcggtcg 
cgatcgcgat cgtcgggatg gcctgccgcc 
tgtgggactt catcgtcgcc gacggcgacg 
gggacctcga cgcgctgttc gacccggacc 
acggcgcgtt cctggacggg gcggccgact 
gtgaggcgtt ggcgatggat ccgcagcagc 
tcgagaacgc cggcatcgac ccgcactccc 
gcgctgcgta ccaggggtac ggccagaacg 
tgctcaccgg tggttcctcg gcggtcgcct 
aggggccggc gatcactgtg gacacggcgt 
cggccgggtc gctgcgatcg ggtgactgtg 
tggccggtcc ggaggtgttc accgagttct 
ggtgcaagcc cttctccgac caggccgacg 
tgctcctgca gcggttgtcg gtggcggtgc 
tgggttcggc ggtgaatcag gatggggcga 
cgcagcagcg ggtgattcgg cgggcgtggg 
gtgtggtgga ggcgcatggg acggggacgc 
tgttggggac gtatggggtg ggtcggggtg 
aggcgaatgt gggtcatgtg caggcggcgg 
tggggttggg tcgggggttg gtgggtccga 
tggattggtc gtcgggtggg ttggtggtgg 
tggatggggt gcgtcggggt ggggtgtcgg 
tggtggtggc ggaggcgccg gggtcggtgg 
cgcgggggtt ggtgggggtg gctggtggtg 
aaaccgccct gaccgagctc gcccgacgac 
tcccggcggt ggccgccacc ctcgccaccg 
tgctggcccg cgaccacgac gaactgcgcg 
cggctcccgg tgtggtgtcg ggggtggcgt 
gtcagggtgg tcagtgggtg gggatggcgc 
agtcggtggt ggagtgtgat gcggtggtgt 
tgttggaggg tcggtcgggt gcgccgtcgt 
tgttcgtggt gatggtgtcg ttggcgcggt 
cggtggtggg tcattcgcag ggggagatcg 
tgggtgatgg tgcgcgggtg gtggcgttgc 
acggcggcat ggtctccctc gcggtctccg 
ggtccgaccg gatctcggtg gcggcggtca 
acccacaggc cctcgccgcc ctcgtcgccc 
cgctgcctgt ggactacgcc tcccactccg 
tcaccgacct ggccgacgtc acggcgcgcc 
acggcgcccg gggcgccggc acggacatgg 
caccggtgcg cttcgacgag gccgtcgagg 
tcgagatgag cccacacccg gtcctcaccg 
tggccatcgg ctcgctgcac cgggacaccg 
gggcccacgt gcacggcgta ccagtggact 
ttcccctgcc gaactacccg ttcgaggcga 
accaggtcgc cgaccaccgc taccgcgtcg 
agctgtccgg cagctacctc gtcttcggcg 
agaaggccgg cgggctcctc gtcccggtgg 
ccctggacga ggcggccgga cgactcgccg 
cccacctggc ccggcaccga ctcctcggcg 
tcaccagcgg cggcgtcgca ctcgacgacc 
tgtgggggat cggacgggtg atgggtctgg 
acgtgaccgt cgaacccacc gccgaggacg 
acgaccacga ggaccaggtg gcgctgcgcg 
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41341 acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 
41401 ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 
41461 cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 
41521 acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 
41581 tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 
41641 aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 
41701 tcgaccggat cgacgaactg gagccggtca gcgccgcgaa ggtgaccggg gcgcggctgc 
41761 tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 
41821 ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 
41881 accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 
41941 acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 
42001 cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 
42061 tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 
42121 tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 
42181 tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 
42241 tcggtcacca ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 
42301 actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 
42361 ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 
42421 agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 
42481 ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 
42541 agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 
42601 tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 
42661 ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 
42721 tcgcgcaacc cgggcacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 
-42781 gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 
^42841 actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 
42901 cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 
42961 cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 
43021 cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 
43081 ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 
43141 gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 
43201 cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 
43261 agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 
43321 actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 
43381 cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 
43441 cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg acccgacctt 
43501 cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 
43561 ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 
43621 gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 
43681 ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 
43741 gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 
43801 ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 
43861 caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 
43921 ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 
43981 ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 
44041 ggtgggcgag cacacggtcg cggcgggcga cgaggtcgtc gcggtggtcg ccgccgccaa 
44101 ccgtgacgcg ggggtcttcg ccgacccgga ccgcctcgac ccggaccggg ccgacgccga 
44161 ccgggccctg tccgcccagc gcggtcaccc cggccggttg gaggagctgg tggtggtcct 
44221 gaccaccgcc gcactgcgca gcgtcgccaa ggcgctgccc ggtctcaccg ccggtggccc 
44281 ggtcgtcagg cgacgtcgtt caccggtcct gcgagccacc gcccactgcc cggtcgaact 
44341 ctgaggtgcc tgcgatgcgc gtcgtcttct cctccatggc cagcaagagc cacctgttcg 
44401 gtctcgttcc cctcgcctgg gccttccgcg cggcgggcca cgaggtacgg gtcgtcgcct 
44461 caccggctct caccgacgac atcacggcgg ccggactgac ggccgtaccg gtcggcaccg 
44521 acgtcgacct tgtcgacttc atgacccacg ccgggtacga catcatcgac tacgtccgca 
44581 gcctggactt cagcgagcgg gacccggcca cctccacctg ggaccacctg ctcggcatgc 
44641 agaccgtcct caccccgacc ttctacgccc tgatgagccc ggactcgctg gtcgagggca 
44701 tgatctcctt ctgtcggtcg tggcgacccg actggtcgtc tggaccgcag accttcgccg 
44761 cgtcgatcgc ggcgacggtg accggcgtgg cccacgcccg actcctgtgg ggacccgaca 
44821 tcacggtacg ggcccggcag aagttcctcg ggctgctgcc cggacagccc gccgcccacc 
44881 gggaggaccc cctcgccgag tggctcacct ggtctgtgga gaggttcggc ggccgggtgc 
44941 cgcaggacgt cgaggagctg gtggtcgggc agtggacgat cgaccccgcc ccggtcggga 
45001 tgcgcctcga caccgggctg aggacggtgg gcatgcgcta cgtcgactac aacggcccgt 
45061 cggtggtgcc ggactggctg cacgacgagc cgacccgccg acgggtctgc ctcaccctgg 
45121 gcatctccag ccgggagaac agcatcgggc aggtctccgt cgacgacctg ttgggtgc.gc 
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45181 tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 
45241 cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 
45301 cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 
45361 gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 
45421 aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 
45481 aggcggtgcg gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 
45541 ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 
45601 gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 
45661 cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 
45721 cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccg;a 
45781 cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 
45841 cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 
45901 cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 
45961 cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 
46021 ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 
46081 ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 
46141 ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 
46201 tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 
46261 gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 
46321 gggacagccg acggcactgc gctcggcggt ggaggcgtac gaggtgttct gcagagacct 
46381 cggcgagcac cccgccgagg tcgcactggc .gtgggtgctg tcccggcccg gtgtggcggg 
46441 ggcggtcgtc ggtgcgcgga cgcccggacg gctcgactcc gcgctccgcg cctgcggcgt 
46501 cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 
46561 aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc gggaacccgt 
46621 gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 
46681 cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 
46741 cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 
46801 gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 
46861 cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 
46921 cagcgacatg tccaacagcc cctggtcggc. caatgcggcc tcgctgaccc cgagcctgcg 
46981 catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 
47041 ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 
47101 acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 
47161 ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 
47221 gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 
47281 ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 
47341 gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 
47401 catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 
47461 accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 
47521 acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 
47581 ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 
47641 ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt -ggcgatcagg acggtgctgt 
47701 acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 
47761 ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 
47821 tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 
47881 ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 
47941 cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 

(SEQ ID NO: 1) 
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TDP-L-megosamine 
(-rhodosamine) 



FIGURE 10 
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SEQUENCE LISTING 



<110> 



Kosan Biosciences, Inc. 



<120> Recombinant Megalomicin Biosynthetic 
Genes and Uses Thereof 

<130> 300622004740 

<140> To be assigned' 
<141> Herewith 

<150> US 60/158/305 
<151> 1999-10-08 

<150> US 60/190,024 
<151> 2000-03-17 

<160> 34 

<170> FastSEQ for Windows Version 4.0 
<210> 1 

<211> 47981 t' 
<212> DNA 

<213> Micromonospora megalomicea 

<220> 

<221> CDS 

<222> (1) . . . (144) 

<223> megBVI (megT) , TDP-4-keto-6-deoxyglucose-2, 3-dehydratase; 
SEQ ID NO: 2= translated amino acid sequence 

<221> CDS 

<222> (928) . . . (2061) 

<223> megDVI, TDP-4-keto-6-deoxyglucose 3, 4-isomerase, 
TDP-4-keto-6-deoxyhexose 3, 4-isomerase; 
SEQ ID NO: 3= translated amino acid sequence 

<221> CDS 

<222> (2072) . . . (3382) 

<223> megDI, rhodosaminyl transferase (eryCIII homolog)., 
TDP-megosamine glycosyltransf erase; 
SEQ ID NO: 4= translated amino acid sequence 

<221> CDS 

<222> (3462) . . . (4634) 

<223> megG(megY), mycarosyl acyltransferase, mycarose O-acyltransf erase; 
SEQ ID NO: 5= translated amino acid sequence 

<221> CDS 

<222> (4651) . . . (5775) 

<223> megDU, deoxysugar transaminase (eryCI, DnrJ homolog), 
TDP-3-keto-6-deoxyhexose 3-aminotransaminase; 
SEQ ID NO: 6= translated amino acid sequence 

<221> CDS 

<222> (5822) ... (6595) 

<223> megDIII, daunosaminyl-N, N-dimethyltransf erase (eryCVI homolog); 
SEQ ID NO: 7= translated amino acid sequence 
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<221> CDS 

<222> (6592) . . . (7197) 

<223> megDIV, TDP-4-keto-6-deoxyglucose 3, 5-epimerase (eryBVII, dnmU 
homolog) , TDP-4-keto-6-deoxyhexose 3, 5-epimerase; 
SEQ ID NO: 8= translated amino acid sequence 

<221> CDS 

<222> (7220) . . . (£206) 

<223> megDV, TDP-hexose 4-ketoreductase (eryBIV, dnmV homolog), 
TDP-4-keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO NO: 9= translated amino acid sequence 

<221> CDS 

<222> (8228) . . . (9220) 

<223> megBII-1 (megDVII) , TDP-4-keto-L-6-deoxy-hexose 2, 3-reductase; 
SEQ ID NO: 10= translated amino acid sequence 

<221> CDS 

<222> (9226) . . . (10479) 

<223> megBV, mycarosyl transferase, mycarose glycosyltransferase; 
SEQ ID NO: 11= translated amino acid sequence 

<221> CDS 

<222> (10483) . . . (11424) v) 
<223> megBIV, TDP-hexose 4-keotreductase, 

TDP-4-keto-6-deoxyhexose 4-ketoreductase; 

SEQ ID NO: 12= translated amino acid sequence 

<221> CDS 

<222> (12181) . . . (22821) 

<223> megAI; SEQ ID NO: 13= translated amino acid sequence 

<221> misc_feature 
<222> (12505)... (13470) 
<223> megAI, AT-L 

<221> misc_feature 
<222> (13576) . . . (13791) 
<223> megAI, ACP-L 

<221> misc_feature 
<222> (13849) ... (15126) 
<223> megAI, KS1 

<221> misc_feature 
<222> (15427) . . . (16476) 
<223> megAI, ATI 

<221> misc feature 
<222> (17155) . . . (17694) 
<223> megAI, KR1 

<221> misc_feature 
<222> (17947) . (18207) 
<223> megAI, ACPI 

<221> misc_feature 
<222> (18268) .. . (19548) 
<223> megAI, KS2 

<221> misc feature 
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<222> (19876) . . . (20910) 
<223> megAI, AT 2 

<221> misc_feature 
<222> (21517) . . . (22053) 
<223> megAI, KR2 

<221> misc_feature 
<222> (22318) . . . (22575) 
.<223> megAI, ACP2 

<221> CDS 

<222> (22867) . . . (33555) 

<223> megAII; SEQ ID NO: 14= translated amino acid sequence 

<221> misc_feature 
<222> (22957) . . . (24237) 
<223> megAII, KS3 

<221> misc_feature 
<222> (24544) . . . (25581) 
<223> megAII, AT 3 

<221> misc_feature 
<222> (26230) . . . (26733) 
<223> megAII, KR3 (inactive) 

<221> misc__feature 
<222> (26998) . - . (27258) 
<223> megAII, ACP3 

<221> miscjfeature 
<222> (27393) . . . (28590) 
<223> megAII, KS4 

<221> miscjfeature 
<222> (28897) . . . (29931) 
<223> megAII, AT 4 

<221> misc_feature 
<222> (29953) . . . (30477) 
<223> megAII, DH4 

<221> misc_feature 
<222> (31396) ... (32244) 
<223> megAII, ER4 

<221> miscjfeature 
<222> (32257) . . . (32799) 
<223> megAII, KR4 

<221> miscjfeature 
<222> (33052) . . . (33312) 
<223> megAII, ACP4 

<221> CDS 

<222> (33666) . . . (43271) 

<223> megAIII; SEQ ID NO: 15= translated amino acid sequence 

<221> miscjfeature 
<222> (33780) . . . (35027) 
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<223> megAIII, KS5 

<221> misc_feature 
<222> (35385) . . . (36419) 
<223> megAIII, AT 5 

<221> misc_feature 
<222> (37068) . . . (37604) 
<223> megAIII, KR5 

<221> misc_feature 
<222> (37860) . . . (38120) 
<223> megAIII, ACP5 

<221> misc_feature 
<222> (38187) . . . (39470) 
<223> megAIII, KS6 

<221> misc_feature 
<222> (39795) . . . (40811) 
<223> megAIII, AT 6 

<221> misc_feature 
<222> (41406) . . . (41936) 
<223> megAIII, KR6 

<221> misc_feature 
<222> (42168) . . . (42425) 
<223> megAIII, ACP6 

<221> misc_feature 
<222> (42585) . . . (43271) 
<223> megAIII, TE 

<221> CDS 

<222> (43268) . . . (44344) 

<223> megCII, TDP-4-keto-6-deoxyglucose 3, 4-isomerase; 
SEQ ID NO: 16= translated amino acid sequence 

<221> CDS 

<222> (44355) . . . (45623) 

<223> megCIII, desosaminyl transferase, desosamine glycosyltransferase; 
SEQ ID NO: 17= translated amino acid sequence 

<221> CDS 

<222> (45620) . - . (46591) 

<223> megBII-2(megBII) , TDP-4-keto-6-deoxy-L~glucose 2,3 dehydratase, 
TDP-4-keto-6-deoxyglucose 2,3 dehydratase; 
SEQ ID NO: 18= translated amino acid sequence 

<221> CDS 

<222> (46660) . . . (47403) 

<223> megH, TEH; SEQ ID NO: 19= translated amino acid sequence 
<221> CDS 

<222> (47411) . . . (47980) 

<223> megF, C-6 hydroxylase; SEQ ID NO: 20= translated amino acid sequence 
<400> 1 

ctcgagecga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 60- 
gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 120 
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atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 180 

taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 240 

cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 300 

gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 360 

tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 420 

gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 480 

ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 540 

catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 600 

cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 660 

ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 720 

gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 780 

ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 840 

ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 900 

tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 960 

gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 1020 

atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 1080 

ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 1140 

gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 1200 

caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 1260 

accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 1320 

ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 1380 

gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 1440 

tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 1500 

accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 1560 

gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 1620 

gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 1680 

cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 1740 

gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 1800 

gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 1860 

cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 1920 

gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 1980 

ctgtcccgtt "ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 2040 

cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 2100 

caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 2160 

ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 2220 

cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 2280 

cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 2340 

cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 2400 

ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 2460 

ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 2520 

gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 2580 

ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 2640 

gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 2700 

cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 2760 

gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 2820 

tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 2880 

catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 2940 

cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 3000 

gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 3060 

ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 3120 

ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 3180 

gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 3240 

cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 3300 

catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 3360 

cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 3420 

aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 3480 

tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 3540 

tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 3600 

acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 3660 

tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc -3720- 

gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 3780 



WO 01/27284 



PCT/US00/27433 



tgttcctggt 
tccacgcctg 
tggcctgcga 
tccgcccgga 
cggtggtcgc 
ccgccatcca 
tcgggatcat 
ccgcggtgct 
ccatctcctc 
ccgacctcca 
tctccttcgc 
ggttcagcca 
tcgcggtctc 
gtaactgggc 
cttcccgccg 
gagtacgaga 
agcctgatcc 
atcgcgcact 
gtaggtgtcg 
ctggccatcg 
ctcatggaca 
gtgcacctgt 
ggcctcaagc 
gccgggacga 
tacggcgacg 
ctgcggtact 
cgcctcgacg 
gtcgcgggtc 
tcgcacggcc 
gtcgtccgcc 
tccctgaaca 
gtcgcgtcgg 
atgtacccct 
atcaccgggc 
catgccgaac 
catctaccac 
cgtggaggtc 
gaccggatcc 
gtcggccgcc 
cgacatgcgc 
caccggttac 
cctcgcgcct 
cggctgggtc 
caccgtcccg 
ggggtcaccg 
ccgcgccgcc 
cgacctgttc 
gaggagctgg 
ggggtgttcg 
ttcccggtgg 
ttcacgacga 
gacttcgccg 
ctctccgccg 
tccctggagg 
gaacgggcgg 
ctcgtcatgt 
atcctgcccg 
cggccgggcg 
cacccacgcc 
cgtgccctcc 
agcgctcgcg 



caccgggcag 
gttcccggcc 
ggcgttcttc 
gcggctgtgg 
cgacctcctg 
ggactggttc 
cctggcccgc 
gttgttcccg 
gtcgatgatg 
gcagaagcgc 
gctctacatg 
gaccgaggac 
cctggtgctg 
ccgcccggcc 
gtaagaagga 
gggaacgagc 
tcggtcagag 
gcgtgggcgt 
gacgcgacga 
acgagatcgg 
ccgacctggt 
acgggcagtg 
tcgtggagga 
tgagcgacgc 
gcggcgcggt 
acgggatgga 
aggtgcaggc 
ggcgggcggt 
tcgaactccc 
acccgcgccg 
tcagctaccc 
'ggtcgctgcc 
ccctccctca 
tgtgacgagc 
agccactcga 
gacttctacc 
gcccgcaagc 
cacctggtcg 
atgctcgcca 
gacttctccc 
ctcgtcgacg 
ggcggcaccc 
ggggccgacc 
gcgggtctgc 
gaggccggga 
tacgagcagg 
tcgccgggcc 
gcatcgaggg 
gcacggcgta 
cccaggtcag 
tgcccggctc 
tcgacatccg 
agtcgatggt 
acgacaccac 
tgcaccccct 
ccgagcggga 
actacgccgc 
tgcgggccgg 
ctggccgacc 
ggtgccgtcg 
gaggtggtcg 



gcggtgagcg 
ctggagatct 
tacctgtgct 
gcctgggccg 
ctgccgagtt 
ctctacacct 
atcctgatca 
gtcttcttcg 
atccttcccc 
accttcatgc 
gtccacttcc 
gccccgctgg 
tcgtggctgc 
tccgcccggc 
cggtgcatcg 
cgacatcctc 
tgtggagaac 
cgacaacggc 
cgaggtcgtc 
cgcccggccg 
ggaggcggcg 
cgtggacatg 
ctgcgcccag 
ggcggccttc 
cgtcaccaac 
ggaggtctac 
cgagatcctg 
cgcccagcgg 
agtggtcacc 
cgacgagatc 
ctggccggtg 
ggtcaccgaa 
cgacctgcag 
ccgcgtgtcg 
ccacgtcgag 
acggccgtgg 
acaccccaca 
agctggcgga 
ccgccgcccg 
tcgaccgcag 
aggccgaact 
tcgtcgtgga 
tggtcaccag 
ccgaccgcac 
tcgagcactt 
ccttccagcg 
ttttcgtcgg 
ggtcttcacc 
ccaggaggac 
caccacccgg 
catggcgaag 
gcccggttcc 
cgggctgtac 
cctcgtctac 
ggatccggag 
ccgggtcgca 
ctgccgggcc 
tggtggtgct 
tcccggtgcg 
ccgactacga 
cggacgcccg 



gtgaggcgct 
ccttcggcat 
tcccgctgtt 
ccgtggtgtt 
ccccgccgct 
tccctgcgac 
ccggtcggtg 
tcgcctcgct 
tggttctgat 
gtaaccgggt 
tggtgatcgt 
gtctcgcact 
tgtacaggtt 
gcaaacccgc 
gtgaccacct 
gatgcggtgc 
ttcgagaccg 
accaacgctg 
acggtctcca 
gtcttcgtgg 
gtcaccccgc 
acagccctgc 
gcccacggtg 
tcgttctacc 
gacgacgaga 
tacgtcaccc 
cggcgcaaac 
tacgtcgacg 
gacggcaacg 
atcaagcgtc 
cacaccatga 
cggctggccg 
gacagggtga 
tcagcgaaga 
caccgacgtc 
caagggatac 
ggcggcgacc 
cagcttccgg 
caacgacccc 
gttcgacgtc 
ggaccgtgcc 
gccctggtgg 
cggtgaccgg 
cgcctcccgg 
caccgaggtg 
ggcgggcctg 
ggtcgccgcg 
ttcaccccgc 
gtgttcgtgg 
tcccggcggg 
tacgtctact 
ccgaccttcg 
cttcccgtgg 
ctgatgtccg 
ctggcgttgc 
cccaccctcc 
gccgcgcacc 
cggcgcgtcg 
ggtgcggctc 
gacgcaccgg 
ggcggtcttc 



gatcccgaac 
caacccggtg 
cctgttctgg 
cgccgcgatc 
gatcccgggg 
gcggagcctg 
gatcaacgtc 
cttcctgccg 
catcgccagc 
gatggtgtgg 
ctacggggcg 
cttcatgatc 
cgtcgagcta 
cacggaaccc 
acgtctggtc 
agaaggtctt 
agtacgcccg 
tgaaactcgc 
acaccgccgc 
acgtccgcga 
gtaccaaggc 
gggaactggc 
cccggcggga 
cgacgaaggt 
cagcccgcgc 
ggaccccggg 
tgacccggct 
ggctcgccga 
aacacgtctt 
tccgggacgg 
ccggcttcgc 
gcgagatctt 
tcgaggcggt 
cccactctgg 
gccccgtacg 
cgtgccgaag 
ctgctggacg 
gaggtggtgg 
gggcgggaac 
gtcacctgca 
gtggcgaacc 
ttcccggaga 
aggatctccc 
atgaccatcc 
cacgtgatga 
agctgctcgt 
gagccggggc 
agacgttcgc 
cggcgctcgg 
gtgtggtccg 
gcgccagggg 
gccgggccga 
gcatgggcca 
ccggttacgt 
cgatcccggc 
gggaggcccg 
gggtggtgcg 
ggtttcctgg 
gtcgcccggc 
gtggacctca 
ccgttcgccg 



ctcctgctga 

agctggtcgt 

atctccggta 

tgggcggtac 

cttgagtact 

gagttcatcc 

gggctgctcc 

ggtgtctacg 

ggcgcgacgg 

ctcggcgacg 

gacctgctgg 

attccgttcc 

cccgtcatgc 

gaacagaccc 

ctatctgttg 

cgccagtggc 

ctaccacggg 

gctggagtcg 

ccccacagtc 

cgaggactac 

catcgtcccg 

cgaccggcgg 

cggtcggctg 

cctcggcgcc - 

cctgcgacgg 

tcacaacagc. 

cgacgcgtac 

cctccaagac 

ctacgtgtac 

gtacgacatc 

ccacctcggt 

ctcccttccc 

gcgggaggtc 

aagggccggt 

agcgggcgga 

ccgacgcgct 

tggcctgcgg 

gggtcgacct 

tgcaccaggg 

tgttcagctc 

tggccggtca 

cgttccggcc 

ggatgtcgca 

actacacggt 

ccctgttcgc 

acgtcggcca 

ggtgagggtc 

cgacgagcgg 

ccgcccgctg 

gggggtgcac 

tagggcgatg 

gccggtcgag 

cctgttcgtc 

ccccgacaag 

cgacctcgac 

ggaccagggg 

gacgtgaccc 

gttcggcggt 

gggaggtcgt 

ccgaacccgg 

cccagatcag 



3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
-3380- 
7440 
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gggtacgtca gggtggcgga tcagcgagga 
cctggtccgg gacctgatcg ccgtcctgtc 
cccgggcagc aacacgcagg tcggcagggt 
gcaggaccac cccgagggcg tctacgacag 
ggaggccact gcggccgggg cgatccgggc 
ggtgcccgcc gccggcaccg ccgacgaccg 
cctgaccggc caaccgctga cgatgtggca 
cgtgaccgac gccgcccggg ccttcgtcac 
acgccacttc ctgttgggga cggggcgttc 
ctcgcgcagc gtcgcccggc acaccggcga 
tccggcgcac atggacccgt cggacctgcg 
ggctgtcacc gggtggcggg ccacggtcac 
ggcgttggcc ccccgccggg ccgccgcccc 
gttcgtccta cggcaccggc ccgtcgacgg 
ggagttcctc ctcgcccagc gtcagctcgg 
gtgtgcgggg gccgatgaca gcgcccagga 
cgacctcggc cgggtccgcg ccgaggcgtc 
ggcgtacggc ggggaggagc acctgggcgc 
ctgccaactt ctccagtacg ccgctgagca 
cgcccacccc gtacgcctgg gcggcgggca 
ggttgtacag gcactggtgg gagatcatgc 
gggcggcggc gatgtgccag cccgccaggt 
tgccgaccag atgttcggcg gcctgccaca 
ggtgcgtctg gtagatgtcg atgtggtcga 
cggcgacgat gtgtcgggcg gagagcccgc 
ccaccttggt cgccaggacg gtctcctcgc 
cgacgagttc ctcggtgtgg cccttgtaga 
tgcagttgac gccccgctcg agggcgtggt 
cccgtccact gaagttcacg gtgccgagcc 
cgacgcgtac ccgggcggac ccggccccgg 
gtgctggtgg gcgagcgcct ccagcacggg 
cgcctcctgc 'cgcagcttct cggcgttctc 
gagagcctgc cagagggtgt cggcgtcgac 
cagctcggcg gtgcgctgac cacgcaggac 
cggtacgccg tggtgcagcg cggtggccca 
ggcacagccc ggcagcagga tgttcatggg 
caccgacgcc ggatcgagcc cggagcgggt 
ggtggccagt gtccggagga actcctgcgg 
cccggtgaag cagacccggc ggactccgtc 
ggacccgttg tagggcaaag tccgggtgtg 
gctctcgggc agctggtcga cgctccactg 
gccgaaccgg ccggcgacct cggtgagcca 
gggacgctgc ccgcgcaggt cctgggagcg 
ccacagcagc cgggcgtggg cggccccgca 
gaagggctcc cagagcacca ggtcgggacg 
gacgaaggag tcgttgttga ccaccgggaa 
gtgcaggaac tcccacgagc gcagttccgg 
gtagcggtgc acctgcgcgg cggcctcagg 
gagtggcacc gaggtcagtc ccgcgccgac 
cacccggacg tcgtggccgg cggtgtgcag 
gtgggtacgg tgcgcgaacg aggtgagcag 
gagggcggca acggtccggt cgatgccctc 
cagcgtccgg aactcggtgg agtcgaagtc 
ctccggtgga gggacgctga cgacgggcac 
ggcggcgacg gtctcgaaga tctcgccgag 
gacgtcgccg accagcgcct cgtggttgtg 
ctcgacgtgc aggaggttgc ggcgcacgct 
ggcgagggct cgccggatca tggcggtgac 
gctgtggccg tagatcgcgg gcaggcgcag 
ggcctgacgc aggatccgct cggcctcgat 
ggggttcgcg gcctgggtgg tgctggcgaa 
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cgacgtggtc gccgaacgga cgaacgtcgg 7500 
ccgctcgccg cacgccccgg tggtggtctt 7560 

caccgccggc cgggtcatcg acggcagcga 7620 

gcagaaacac accggggaac agctgctcaa 7680 

gaccagtctg cggctgcccc cggtgttcgg 7740 

gggggtggtc tccaccatga tccgtcgggc 7800 

cgacggcacc gtccggcgtg aactgctgta 78 60 

cgccctggac cacgccgacg cgctcgccgg 7920 

ctggccgctg ggcgaggtct tccaggcggt 7980 

ggacccggtg ccggtggtct cggtgccgcc 8040 

cagcgtggag gtcgaccccg cccggttcac 8100 

gatggcggag gcggtcgacc ggacggtggc 8160 

gtccgagccc tcctgaccgg ggtcacccgg 8220 

ccggtgccgg gaagatcgct tcgagttccc 8280 

cggcccgtaa cgccgagtcg agctgctcgg 8340 

tcccggggcg ggacaggacc caggccagac 8400 

ggcagtagtc ctcgtacgcc tcgacgaggg 84 60 

gtccctgcgc cgacttgacg gcggttccgg 8520 

gcccgccgtg caggggggac caggcgaaca 8580 

ggacgtccag ctcggggtgg cggacggcca 8640 

cgagcaggtt gcggcgtgcc gcgctctcct 8700 

tggaggagcc gacgtacccg accttcccac 8760 

cctcgtccca cggtgcggcg cggtcgatgt 8820 

ccccgaggcg gcggagggag ttctcgcagg 8880 

cgtcgttgac ccgttcgctc atctcgctgc 8940 

gtcgacctcc gccctgggcg aaccaccgtc 9000 

gccgccagcc gtagatgtcg gcggtgtcga 9060 

ccatcagccg cagcgcgtcg tcgtcggtca 9120 

agagtcggct ggtgtgcaac gccgatcgtc 9180 

tggttcccac gtcggtcacc tgtcggcgcg 9240 

tacgacctcg gcgggggtcg gcgcggccag 9300 

ggcgtgggaa cggtcctcga ccactgtggc 9360 

ctcgtccgga cggaggaaga cacccgctcc 9420 

acagtcccac tcgtgggcga cggagatctg 9480 

gcttccggca ccgccgtggt ggatgacggc 9540 

aacgaagtcc accaggcgga cgttgtccgg 9600 

caccacgatc tcgccgtcga accgcgcgag 9660 

gttcgaggtg atgcccagcg ccgagtatcc 9720 

cgaggtcctg agccactgcg gcacgacgga 9780 

caccgactcc agtccggtct ccaggcggaa 9840 

tccgacagcg aggtcctcgc tgtagtcgag 9900 

gccgccgagc gggtccggcc ggtcgtcggc 9960 

gctgcggaag tagccggtga ggtcgctgcc 10020 

ggccttggcc gcgaccgccc cggcgaaggt 10080 

ccagtccatg gcgaactcga cgagttcgtc 10140 

gacgaaccgg gaggtggcct cctcgatgcc 10200 

tccgcgtcgg gcgaagtcca ggtcggtggt 10260 

ggagatgtcg aagagtcggt ggtccgagcc 10320 

gacgacgtcg gtgagctcgg gctgactggc 10380 

cgcccaggcc agggggacga ggccctggaa 10440 

gacccgcact ggtcactcct tggtcgagat 10500 

ggccagcggc acccgggggt gccagccggt 10560 

gtcgctgcgg aagtcgttgg cctcggcgtt 10620 

cgcagggttg ccggtctgac gtgccacgct 10680 

gggtcgggcc tcgtccgcgc tcggcgtcca 10740 

cagtgcggcg gtgaacgcgg tggccacgtc 10800 

gccctcgtgc cacatcgtga tcggctcacc 10860 

gacaccccgg ccggtctgcc ccgacgggcc 10920 

gatcaccccg tcgacgaccc cgtcctcggt 10980 

cttgtgctgg gcgtaccggc tgggggcggc 14-040- 

caggagcacc ggcgcgggtc cgggtcttgc 11100 
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ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 11160 

cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 11220 

gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 11280 

ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 11340 

tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 11400 

gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 114 60 

tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 11520 

cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 11580 

tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 11640 

gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 11700 

ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt 11760 

ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 11820 

ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 11880 

ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 11940 

tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 12000 

tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 12060 

agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 12120 

tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 12180 

gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 12240 

ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 1*2300 

gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 12360 

gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 12420 

ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 124 80 

gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 12540 

ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 12600 

gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 12660 

cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 12720 

gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 12780 

ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 12840 

ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 12900 

tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 12960 

gtcaacggtc 'cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 13020 

gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 13080 

tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 13140 

ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 13200 

ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 13260 

cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 13320 

gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 13380 

ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 13440 

ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 13500 

tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 13560 

gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 13620 

gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 13680 

gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 13740 

tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 13800 

ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 13860 

gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 13920 

ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 13980 

gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 14040 

ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 14100 

ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 14160 

gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 14220 

ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 14280 

accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 14 340 

ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 14 400 

cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg 144 60 

acaccgggca tgctcgtgga cttcagtcgg atgaactccc tcgcccccga cggacggtcc 14520 

aaggcgttct cggccgccgc cgacgggttc ggcatggccg aaggcgcagg gatgctcctg 14580 

ctggaacggc tctcggacgc ccgccgccac ggccacccgg tgctcgccgt gatcaggggc 14 640 

accgctgtca actccgacgg cgcgagcaac ggactctccg ccccgaacgg ccgggcccag 1-4700- 

gtccgggtga tccgacaggc cctcgccgag tccgggctga cgccccacac cgtcgacgtc 14760 
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gtggagaccc 
gacgcgtacg 
gggcacaccc 
gccggtgtcc 
tcgggcgcga 
cgggccgggg 
gcgccgccga 
ctctcggcga 
cgcgagcacc 
gcgctggcgt 
gacgaactcg 
cgcgtcgtct 
ctcgacggcg 
tacctggact 
cacacgctct 
tccctggcgg 
cagggggaga 
gcggtggccc 
atcgccgcct 
gtcaacggtc 
gcctcctgca 
tcctcgcacg 
ctgccgggct 
ctcgacgccg 
cgctccctcg 
accacggcga 
ctgcgacgtg 
gccggcgtcg 
ctgcccacgt 
gtcgccgact 
ccgggtgagc 
gacgaccggg 
ctggtggtgg 
ccggtggcgg 
ctggcggtga 
cgggagtgtc 
ctccgcgacc 
cccgccgtct 
cacctcggga 
acgtacgccc 
ggcacggtgc 
gcccgccagg 
gtcgaggagc 
gacgtcaccg 
ctgtcggcgg 
ggtgaccgca 
ctgacccggg 
ggcgcgccgg 
cagcgacgca 
gggatggccg 
cccgaccagg 
gtcgtcgaca 
ctcttcgaca 
gtggcggcgc 
cggacgcacg 
gccttcgccg 
actgcgaccg 
ctggccggac 
gaggccccga 
ctgccggggg 
accgcctcgg 



acggcaccgg 
gcggtgaccg 
aggccgccgc 
tgccccgcac 
tcagcctgct 
tgtcctcgtt 
ccggtgacga 
gcaccggcga 
ccgaccagga 
accgtagtgg 
ccgccggtgg 
tcgtcttccc 
acccggtctt 
tcgagatcgt 
ccaccgaccg 
cccggtggcg 
ttgccgcggc 
tgcgcagccg 
ccgtcgacga 
cgcgcgcggt 
ccgtcgaggg 
tcgaggccgt 
tcgtgccgtt 
ggtactggtt 
ccgaccaggg 
tcgaggagat 
gggccggcgg 
cagtggactg 
acccgttcca 
ccgacgacgt 
cgggacggct 
tcgaggcggc 
agccccggac 
gcgtgctctg 
cgtcgttgtc 
cgatctgggt 
cggcccacgg 
ggggcggcct 
cgaccctgtc 
gccggtggtg 
tcgtcaccgg 
gcaccccgtg 
tactcaccga 
accgggagca 
tgttccacgt 
tcgaacgggc 
acgccgacct 
ggctcggcgg 
gcgagggact 
agggtccggt 
ccgtcgaggg 
tcaggtggga 
ccctcgacga 
tggccgggct 
cggctgccgt 
aactcggcgt 
gggtccggct 
acctggccgc 
cggtggcccc 
gagtggactc 
cggcacccgg 



cacccgcctc 

tgagcacccg 

cggtgtcgcc 

cctgcacgcc 

ccaggagccc 

cggcatcagc 

cacccgaccc 

ggcgttgcgc 

cctggacgac 

gttcgtgccc 

atccggggac 

cggccaggga 

cgcctcggtg 

cccgttcctg 

cgtcgacgtg 

ggcgtacggg 

gtgtgtggcc 

ggtcatcgcc 

ggtggcggcc 

ggtggtctcc 

ggtgcgggcc 

ccgtgacgcg 

ctactcgaca 

tcgcaacctg 

gtacacgacg 

cggtgaggac- 

tcccgtcgac 

ggagtcggcg 

gcgtgagcgc 

ctcgtccctg 

cgacggcacc 

gcggcaggcg 

gggccgggtc 

cctgttcgct 

ggacacgctc 

ggtcaccgag 

cgcgctctgg 

ggtcgacgtg 

cggcgccggc 

cagggcgggc 

cggcaccggc 

cctggtgctg 

actcgccgac 

gctccgtgcc 

cgccgcgacg 

caaccgggcg 

cgacgcgttc 

ctacgtcccg 

cccggccacc 

cgccgaccgg 

tctccgggtg 

ccggttcctc 

ggcccgtcgg 

gcccgtcggg 

cctcggccac 

cgactcgctg 

ggccacgacg 

cgaactgggc 

gaccgacgag 

accggagcag 

ggaccggagc 



ggtgatccga 
ctgcggatcg 
ggtctgatca 
gacgagccgt 
gctgcctggc 
ggcaccaacg 
gaccggatgg 
gcccgggcgg 
gtcgcctact 
gccgacgcgt 
gcggtgaccg 
tggcagtggg 
ctgcgggagt 
cgggccgagg 
gtccagccgg 
gtggaaccgg 
ggggcgctct 
accatgcccg 
cggatcgacg 
ggcgaccgtg 
aagcggctgc 
ctccacgccg 
gtcaccggcc 
cgccacaggg 
ttcctggagg 
cgtggcggtg 
ttcggctccg 
taccagggtg 
ttctggttgg 
cggtaccgca 
tggctgctgg 
ctggagtccg 
gacctggtgc 
gtcgcggagc 
gacctgaccc 
aacgccgtcg 
gccctcggtc 
ccgtcgggtt 
gaggaccagg 
gcgggcggca 
ggggtcggtc 
gccagccgcc 
ctgggcaccc 
ctcctcgcga 
ctcgacgacg 
aaggtgctcg 
gtgctcttct 
ggcaacgcct 
tcggtggcgt 
ttccgccggc 
gcactggtgc 
ctcgcgtaca 
gccgcgcccg 
gaacgcgaga 
gcctcggccg 
tcggccctgg 
acggtcttcg 
ggcggatcgg 
ccgatcgcca 
ctgtgggagt 
tgggatccgg 



tcgaggcacg 
gctcggtcaa 
aactggtgtt 
caccggagat 
ccgccggcga 
cacacgcgat 
gcccggtggt 
cgcggctggc 
cgctggccac 
ccacggcgct 
gcaccgcccg 
cggggatggc 
gcgccgacgc 
cgcagcgccg 
tgctgttcgc 
cggccgtcat 
cgctggacga 
gcaacggcgc 
ggcgggtcga 
acgacctgga 
cggtggacta 
aactcggcga 
gctgggtcga 
tccggttcgc 
tcagcgccca 
acctcgtcgc 
cgctggcccg 
ccggggcgcg 
aaccgaatcc 
tcgaatggca 
cgacgtaccc 
ccggggcgcg 
ggcggctcga 
cggcggccga 
aggcggtggc 
ccgtcgggcc 
gggtcgtcgc 
cggtcgccga 
tcgccctccg 
cgggccggtg 
ggcacgtcgc 
ggggaccgga 
gggccaccgt 
ccgtcgacga 
gcaccgtcga 
gtgcccgcaa 
cctcctccac 
acctcgacgg 
ggggtacctg 
acggggtcat 
agggtgaggt 
ccgcgcagcg 
gtcccgacgc 
aggcggtcct 
agcaggtgcc 
aactgcgcaa 
accacccgga 
ggcgggagcg 
tcgtcgggat 
tgatcgtctc 
cggagttgat 



ggcgctctcc 
gtccaacatc 
ggcgatgcag 
cgactggtcc 
gcggccccgc 
catcgaggag 
gccctgggtg 
cgggcaccta 
cggtcgggcc 
gcggatcctc 
cgccccgcag 
agtcgacctg 
gttggaaccg 
gacccccgac 
ggtgatggtg 
cggacactcc 
cgcggcccgg 
gatggcctcg 
gatcgccgcc 
ccgcctggtc 
cgcgtcgcac 
gttccggccg 
gcccgccgaa 
cgacgcggtc 
cccggtgctc 
tgtccactcg 
cgccttcgtg 
tcgggtgccg 
ggcccgcagg 
cccgaccgat 
cggtcgggcc 
ggtcgaggac 
cgccgtgggt 
acactccccg 
cgggtcgggc 
cttcgaacgg 
cctggagaac 
gctgtcgcgt 
acccgacggg 
gcagccccgg 
ccggtggctg 
cgccgacggg 
caccgcctgc 
cgagcacccg 
gaccctcacc 
cctgcacgag 
cgccgcgttc 
tctcgcccag 
ggcgggcagc 
ggagatgcac 
agccccgatc 
ccccacccgg 
cgggccgggg 
cgacctggta 
cgtcgacagg 
ccggctgacc 
cgtacggacc 
gcccgggggc 
ggcctgccgg 
cgggcgggac 
ggtctccgac 



14820 

14880 

14940 

15000 

15060 

15120 

15180 

15240 

15300 

15360' 

15420 

15480 

15540 

15600 

15660 

15720 

15780 

15840 

15900 

15960 

16020 

16080 

16140 

16200 

16260 

16320 

16380 

16440 

16500 

16560 

16620 

16680 

16740 

16800 

16860 

16920 

16980 

17040 

17100 

17160 

17220 

17280 

17340 

17400 

17460 

17520 

17580 

17640 

17700 

17760 

17820 

17880 

17940 

18000 

18060 

18120 

18180 

18240 

18300 

18-360 - 

18420 
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acgacgggca cccgtaccgc cttcggcaac ttcatgcccg gggcgggcga gttcgacgcg 18480 

gcgttcttcg ggatctcgcc gcgtgaggcg ttggcgatgg atccgcagca gcggcacgcc 18540 

ctggagacca cctgggaggc gctggagaac gccggtatcc ggcccgagtc gttgcgcggt 18600 

acggacaccg gtgtcttcgt gggcatgtcc catcaggggt acgccaccgg ccgcccgaag 18660 

cccgaggacg aggtcgacgg ctacctgttg acaggcaaca ccgcgagcgt cgcctccggt 18720 

cggatcgcgt acgtgttggg gttggagggg ccggcgatca ctgtggacac ggcgtgttcg 18780 

tcgtcgcttg tggcgttgca cgtggcggcg ggttcgttgc gttctgggga ctgtggtctg 18840 

gcggtggcgg gtggggtgtc ggtgatggcc ggtccggagg tgttcaggga gttctcccgg 18900 

cagggcgcgt tggctccgga cggcaggtgc aagcccttct cggacgaggc cgacggcttc 18960 

ggtctggggg aggggtcggc cttcgtcgtg ttgcagcggt tgtcggtggc ggtgcgggag 19020 

gggcgtcggg tgttgggtgt ggtggtgggt tcggcggtga atcaggatgg ggcgagtaat 19080 

gggttggcgg cgccgtcggg ggtggcgcag cagcgggtga ttcggcgggc gtggggtcgt 19140 

gcgggtgtgt cgggtgggga tgtgggtgtg gtggaggcgc atgggacggg gacgcggttg 19200 

ggggatccgg tggagttggg ggcgttgttg gggacgtatg gggtgggtcg gggtggggtg 19260 

ggtccggtgg tggtgggttc ggtgaaggcg aatgtgggtc atgtgcaggc ggcggcgggt 19320 

gtggtgggtg tgatcaaggt ggtgttgggg ttgggtcggg ggttggtggg tccgatggtg 19380 

tgtcggggtg ggttgtcggg gttggtggat tggtcgtcgg gtgggttggt ggtggcggat 19440 

ggggtgcggg ggtggccggt gggtgtggat ggggtgcgtc ggggtggggt gtcggcgttt 19500 

ggggtgtcgg ggacgaatgc tcatgtggtg gtggcggagg cgccggggtc ggtggtgggg 19560 

gcggaacggc cggtggaggg gtcgtcgcgg gggttggtgg gggtggttgg tggtgtggtg 19620 

ccggtggtgc tgtcggcaaa gaccgaaacc gccctgcacg cccaggcacg tcgactcgcc 19680 

gaccacctgg agacgcaccc cgacgtcccg atgaccgacg tggtgtggac gctgacgcag 19740 

gcccgccaac gcttcgacag gcgcgcggtc ctcctcgccg ccgaccggac ccaggccgtg 19800 

gaacggctgc gcggcctcgc cgggggcgaa ccggggaccg gtgtggtgtc gggggtggcg 19860 

tcgggtggtg gtgtggtgtt tgtttttcct ggtcagggtg gtcagtgggt ggggatggcg 19920 

cgggggttgt tgtcggttcc ggtgtttgtg gagtcggtgg tggagtgtga tgcggtggtg 19980 

tcgtcggtgg tggggttttc ggtgttgggg gtgttggagg gtcggtcggg tgcgccgtcg 20040 

ttggatcggg tggatgtggt gcagccggtg ttgttcgtgg tgatggtgtc gttggcgcgg 20100 

ttgtggcggt ggtgtggggt tgtgcctgcg gcggtggtgg gtcattcgca gggggagatc 20160 

gcggcggcgg tggtggcggg ggtgttgtcg gtgggtgatg gtgcgcgggt ggtggcgttg 20220 

cgggcgcggg cgttgcgggc gttggccggc cacggcggca tggcctcggt acgccgaggc 20280 

cgcgacgacg 'tacagaagct cctcgacagc ggcccctgga cggggaagct ggagatcgcc 20340 

gcggtcaacg gccccgacgc ggtggtggtc tccggcgacc cccgagccgt gaccgagctg 20400 

gtcgagcact gtgacgggat cggggtccgg gcccggacga tccccgtcga ctacgcctcc 204 60 

cactccgcac aggtcgagtc gctccgggag gagctgctct ccgtcctggc cgggatcgag 20520 

ggccgcccgg cgacggtgcc gttctactcc accctcaccg gtgggttcgt cgacggcacc 20580 

gaactggacg ccgactactg gtaccgcaac ctgcgccacc cggtgcggtt ccacgccgcc 20640 

gtcgaggcgc tggcagcgcg tgacctcacc acgttcgtcg aggtcagccc gcaccccgtg 20700 

ctgtcgatgg cggtcgggga gacgcttgcc gacgtggagt -ccgccgtcac tgtgggcacc 20760 

ctggaacgcg acaccgacga cgtcgagcgc ttcctcacct ccctcgccga ggcgcacgtc 20820 

cacggcgtac ccgtggactg ggcggcggtc ctcggctccg gaaccctggt cgacctgccc 20880 

acctatccct tccagggacg gcggttctgg ctgcaccccg accgtggtcc gcgtgacgat 20940 

gtcgccgact ggttccaccg ggtcgactgg acggcgacgg ccaccgacgg gtcggcccga 21000 

ctcgacggtc gctggctggt ggtcgtaccc gaggggtaca cggacgacgg ctgggtcgtg 21060 

gaggtgcggg ccgccctcgc cgccggtggt gccgagccgg tggtgacgac ggtcgaggag 21120 

gtcaccgacc gggtcggtga cagcgacgcg gtggtgtcga tgctcgggct ggccgacgac 21180 

ggtgcggccg agaccctggc gctgctgcga cgactcgacg cacaggcgtc caccacccca 2124 0 

ctgtgggtgg tcaccgtggg ggccgtcgcc cccgccggtc cggtgcagcg ccccgaacag 21300 

gcgacggtgt gggggttggc ccttgtcgcc tccctggaac gcggacaccg gtggaccggc 21360 

ctgctggatc tgccgcagac accggacccg cagctacgac cccggctggt cgaggcgctc 21420 

gccggtgccg aggaccaggt agcggtccgc gccgacgccg tacacgcccg tcggatcgtc 21480 

cccaccccgg tcaccggagc cgggccgtac accgccccgg gcgggacgat cctcgtcacc 2154 0 

gggggcaccg ccggtctggg tgccgtcacc gcccgatggc tcgccgagcg cggtgccgaa 21600 

cacctcgccc tggtcagccg gcgcgggccg ggcaccgccg gcgtcgacga ggtggtccgg 21660 

gacctgaccg ggctcggcgt acgggtgtcg gtgcactcct gcgacgtcgg cgaccgcgag 21720 

tcggtcggcg ccctggtgca ggagttgaca gcagccggtg acgtggtccg gggggtggtc 21780 

cacgctgccg gtctgcccca gcaggtgcca ctgaccgaca tggacccggc cgacctcgcc 21840 

gacgtggtgg ccgtgaaggt cgacggcgcg gtgcacctgg ccgacctgtg cccggaggcc 21900 

gaactgttcc tgctgttctc ctccggggcc ggggtgtggg gcagtgcccg tcagggtgcg 21960 

tacgccgccg gaaacgcctt cctggacgcc ttcgcccgac accggcggga ccggggtctg 22020- 

cccgccacct cggtggcgtg ggggctctgg gcggccgggg ggatgacagg ggaccaggag 22080 
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gcggtgtcgt tcctgcgtga gcggggcgta cggccgatgt cggtgccgag ggcactggaa 22140 

gcgctggaac gggtcctcac cgccggggag accgcggtgg tcgtcgccga cgtcgactgg 22200 

gcggccttcg ccgagtcgta cacctccgcc cggccccggc cgctgctcca ccggctcgtc 22260 

acacctgcgg cggcggtcgg cgagcgcgac gagccgcgtg agcagaccct ccgggaccgg 22320 

ctggcggccc tgccccgggc cgagcggtcg gcggagctgg tacgcctggt ccggcgggac 22380 

gccgcagccg tgctcggcag cgacgcgaag gccgtacccg ccaccacgcc gttcaaggac 22440 

ctcgggttcg actcgctggc cgcggtccgg ttccgtaacc ggctggccgc ccacaccggt 22500 

ctgcgtctgc cggccaccct ggtcttcgag cacccgaacg ccgcagccgt cgccgacctc 22560 

ctccacgacc gactcggcga ggccggcgag ccgacccccg tccggtcggt gggcgccgga 22620 

ctggccgcgc tggagcaggc cctgcccgac gcctccgaca cggagcgggt cgagctggtc 22680 

gagcgcctgg aacggatgct cgccgggctc cgccccgagg ccggagccgg ggccgacgcc 22740 

ccgaccgccg gtgacgacct gggggaggcc ggcgtcgacg aactcctcga cgcgctcgaa 22800 

cgggaactcg acgccaggtg aacccgaact gaccgcagcc gcagccgaag cagagaccga 22860 

ggacctgtga ctgacaacga caaggtggcg gagtacctcc gtcgtgcgac gctcgacctg 22920 

cgggccgccc gcaagcgcct gcgcgagctg caatccgacc cgatcgcggt cgtcggcatg 22980 

gcctgccgcc taccgggcgg ggtgcacctc ccgcagcacc tgtgggacct cctgcgccag 23040 

gggcacgaga cggtgtccac cttccccacc gggcgcggct gggacctggc cgggctcttc 23100 

cacccggacc ccgaccaccc cggcaccagc tacgtcgacc ggggtgggtt cctcgacgac 23160 

gtggcgggct tcgacgccga gttcttcggg atctccccgc gcgaggccac ggccatggac 23220 

ccgcaacagc ggctgctgtt ggagaccagt tgggagctgg tggagagcgc cggcatcgat 23280 

ccgcactccc tgcgtggcac cccgaccggc gtcttcctcg gcgtggcgcg gctcggctac 23340 

ggcgagaacg gcaccgaagc cggtgacgcc gagggctatt cggtgaccgg ggtggcaccc 23400 

gctgtcgcct ccgggcggat ctcctacgcc ctcgggctgg agggtccgtc gatcagcgtg 23460 

gacaccgcgt gctcgtcgtc gttggtggcg ctgcacctgg cggtcgagtc gctgcggctg 23520 

ggcgagtcga gtctcgctgt cgtcggcggg gcggcggtca tggcgacacc aggggtgttc 23580 

gtcgacttca gccgccagcg ggcgttggcc gctgacggca ggtcgaaggc cttcggggcc 23640 

gccgccgacg ggttcggctt ctccgagggg gtctccctcg tcctgctcga acggctctcc 23700 

gaggccgaaa gcaacggcca cgaggtgttg gctgtcatcc gtggctccgc cctcaaccag 23760 

gacggggcca gcaacggtct cgccgcgccg aacgggaccg cccagcgcaa ggtgatccgg 23820 

caggcgctac gaaactgcgg cctgaccccg gccgacgtgg acgccgtgga ggcgcacggc 23880 

accggcacca cgctcggcga cccgatcgag gccaacgccc tgctggacac ctacggccgt 23940 

gaccgggatc cggaccaccc gctgtggctg gggtcggtga agtcgaacat cggccacacg 24000 

caggcggcgg cgggcgtcac cgggctgctc aagatggtgc tggcactgcg ccacgaggaa 24060 

ctgcccgcca ccctgcacgt cgacgagccc accccgcacg tggactggtc ctcgggagcg 24120 

gtacgcctgg cgacccgggg ccggccgtgg cggcggggtg accggccgag gcgggccggg 24180 

gtgtcggcgt tcggcatcag cgggaccaac gcccacgtga tcgtcgagga ggcacccgag 24240 

cggaccaccg agcgcaccgt cggcggcgac gtcggcccgg tcccgctcgt ggtgtccgcc 24300 

cggtcggcgg cggcgctacg ggcccaggcg gcccaggtcg ccgagctggt ggagggctcc 24360 

gacgtcgggc tggcggaggt cgggcggagc ctggccgtga cccgggcgcg acacgagcac 24 420 

cgggcggcgg tggtggcgtc gacccgggcc gaggcggtgc gggggctgcg cgaggtcgcg 24480 

gcggtcgaac cgcgcggcga ggacaccgtc accggggtcg ccgagacgtc cgggcgcacc 24540 

gtcgtcttcc tcttcccggg acaggggtcc cagtgggtcg ggatgggcgc ggagctgctg 24600 

gactcggcac cggcgttcgc cgacacgatc cgcgcctgcg acgaggcgat ggcaccgttg 24 660 

caggactggt cggtctccga cgtgctccgg caggagccgg gggcaccggg actggaccgg 24720 

gtcgacgtgg tgcagccggt gctgttcgcg gtgatggtgt cgttggcgcg gttgtggcag 24780 

tcgtacgggg tcacccccgc tgcggtggtg gggcactcgc agggggagat cgccgccgcc 24840 

cacgtggcgg gtgcgctctc cctcgccgac gcggcgaggc .tggtggtggg ccgcagccgg 24900 

ttgctgcggt cgctgtccgg gggcggcggc atgagcgccg tcgcgctcgg tgaggccgag 24960 

gtacgccgcc gactgcggtc gtgggaggac cggatctccg tggccgccgt caacggaccc 25020 

cggtcggtgg tggtggccgg ggaaccggag gcgctgcggg agtggggacg ggagcgggag 25080 

gccgagggcg tacgggtccg cgagatcgac gtcgactacg cctcgcactc gccgcagatc 25140 

gacagggtcc gtgacgaact cctgacggtc acgggggaga tcgagccccg gtcggcggag 25200 

atcaccttct actcgacggt cgacgtccgt gctgtcgacg gcaccgacct ggacgcgggg 25260 

tactggtacc gcaacctgcg ggagacggtc cggttcgccg acgcgatgac ccggttggcc 25320 

gactcgggat acgacgcgtt cgtcgaggtc agcccgcatc cggtggtggt gtcggcggtc 25380 

gccgaggcgg tcgaggaggc aggtgtcgag gacgccgtcg tcgtcggcac cctgtcccgg 25440 

ggcgacggcg gaccgggggc gttcctgcgg tcggcggcca ccgcccactg cgccggtgtg 25500 

gacgtcgact ggacgcccgc cctcccggga gctgcgacga tcccgttgcc gacgtacccg 25560 

ttccaacgga agccgtactg gctgcggtcg tctgctcccg cccccgcctc ccacgatctc 25620 

gcctaccggg tgtcctggac gccgatcacc ccgcccgggg acggcgtact cgacggcgac 2S680 

tggctggtgg tgcaccccgg gggcagcacc ggatgggtcg acgggttggc ggcggcgatc 25740 
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accgccggcg gtggccgggt cgtcgcccac ccggtggact ccgtgacctc ccggaccggc 25800 

ctggccgagg cgctcgcccg gcgggacggc acgttccggg gggtgctgtc gtgggtggcg 25860 

accgacgaac ggcacgtcga ggccggtgcg gtcgccctgc tgaccctggc gcaggcgttg 25920 

ggtgacgccg gaatcgacgc accactgtgg tgcctgaccc aggaggcggt ccgtaccccc 25980 

gtcgacggtg acctggcccg accggcgcag gccgccctgc acggtttcgc ccaggtcgcc 26040 

cggctggagc tggcccgccg cttcggtggg gtgctcgacc tgcccgccac cgtcgacgcc 26100 

gccgggacgc gtctggtcgc ggcggtcctc gccggcggcg gcgaggacgt cgtcgccgtc 26160 

cgtggcgacc gtctctacgg ccgtcgcctg gtcagggcga ccctgccgcc gcccggcggg 26220 

gggttcaccc cgcacggcac cgtcctggtc accggcgcgg ccggtccggt gggcggtcgg ■ 26280 

ctggcccggt ggctcgccga acggggtgcc acccgactcg tcctgcccgg cgcacacccg 26340 

ggcgaggagt tgctgaccgc gatccgggcc gccggtgcca ccgccgtggt gtgcgaaccg 26400 

gaggcggagg cactgcgtac ggcgatcggc ggggagttgc cgaccgcgct cgtacacgcc 264 60 

gagacgttga cgaacttcgc cggcgtcgcc gacgccgacc ccgaggactt cgccgccacc 26520 

gtcgcggcga agaccgcgct gccgacggtc ctggcggagg tgctcggcga ccaccgcctc 26580 

gaacgggagg tctactgctc gtcggtggcc ggggtctggg gtggggtcgg catggccgcg 26640 

tacgccgccg gcagcgccta cctcgacgcc ctggtcgagc accgtcgcgc ccgggggcac 26700 

gccagcgcct cggtggcctg gaccccgtgg gccctgcccg gcgcggtcga cgacggtcgg 26760 

ctgcgcgagc gcggcctgcg cagcctcgac gtggccgacg ccctcgggac gtgggaacgt 26820 

ctgctccgcg ccggtgcggt gtcggtggcc gtcgccgacg tcgactggtc ggtcttcaca 26880 

gagggtttcg cggccatccg gccgaccccg ctcttcgacg aactcctcga ccggcgcggg 26940 

gaccccgacg gcgcgcccgt cgaccggccg ggggagccgg cgggcgagtg gggtcgacga 27000 

atcgcggcgc tgtccccgca ggaacagcgg gagacgttgc tgaccctcgt cggcgagacg 27060 

gtcgcggagg tgctgggaca cgagaccggc accgagatca acacccgtcg ggccttcagc 27120 

gaactcggcc tcgactcgct gggctcgatg gccctgcgtc agcgcctggc ggcccgtacc 27180 

ggcctgcgga tgccggcctc gctggtcttc gaccacccga cggtcaccgc gctcgcgcgg 27240 

tacctgcgtc gactggtcgt cggggactcc gacccgaccc cggtacgggt gttcggcccc 27300 

accgacgagg ccgaacccgt cgccgtggtc ggcatcggct gccggttccc cggcggcatc 27360 

gccacccccg aggacctctg gcgggtggtg fcccgagggca cctccatcac caccggattc 27420 

cccaccgacc ggggctggga cctccggcgg ctctaccacc ccgacccgga ccaccccggc 27480 

accagctacg tcgacagggg gggattcctc gacggggccc cggacttcga ccccgggttc 2754 0 

ttcgggatca ccccccgcga ggcgctggcg atggacccgc agcagcggct caccctggag 27600 

atcgcgtggg 'aggcggtgga acgggcgggc atcgacccgg agaccctcct cggcagcgac 27660 

accggcgtct tcgtcggcat gaacggccag tcctacctgc aactgctgac cggggagggt 27720 

gaccggctca acggctacca ggggttgggc aactcggcga gcgtgctctc cggccgtgtc 27780 

gcctacacct tcgggtggga ggggccggcg ctgacggtgg acaccgcctg ctcgtcctcg 27840 

ctggtcgcca tccacctcgc catgcagtcg ctgcgtcggg gtgagtgctc gctggcgttg 27900 

gccggcgggg tgacggtcat ggccgacccg tacaccttcg tggacttcag cgcacagcgg 27960 

gggctcgccg ccgacgggcg gtgcaaggcg ttctccgcgc aggccgacgg gttcgccctc 28020 

gccgagggcg tcgcggcgct cgtcctcgaa ccgttgtcca aggcgcggcg aaacggccac 28080 

caggtgctgg cggtgctgcg cggcagcgcc gtcaaccagg acggggccag caacggcctc 28140 

gccgccccga acgggccgtc gcaggaacgg gtgatcaggc aggccctgac cgcctccggg 28200 

ctgcgtcccg ccgacgtcga catggtggag gcgcacggga cgggcaccga actcggcgac 28260 

ccgatcgagg ccggggcgct catcgcggcg tacggccggg accgggaccg gccgctctgg 28320 

ctgggctcgg tgaagacgaa catcggccac acccaggccg ccgccggtgc cgccggggtg 28380 

atcaaggcgg tcctggcgat gcggcacggc gtactcccga ggtcgctgca cgccgacgag 28440 

ttgtccccgc acatcgactg ggcggacggg aaggtcgagg tgctccgcga ggcacgacag 28500 

tggccccccg gtgagcgccc ccgccgcgcc ggggtgtcct ccttcggcgt cagcgggacc 28560 

aacgcccacg tcatcgtcga ggaggcaccc gccgaaccgg accccgaacc ggttcccgcc 28620 

gccccgggcg ggcccctgcc cttcgtcctg cacggacgca gcgtccagac ggtccggtcc 28680 

caggcgcgga ccctcgccga acacctgcgc accaccggcc accgggacct cgccgacacc 28740 

gcccgtaccc tggccaccgg tcgcgcccgt ttcgacgtcc gggccgcagt gctcggcacc 28800 

gaccgggagg gtgtctgcgc cgccctcgac gcgctggcgc aggatcgccc ctcgcccgac 28860 

gtcgtcgccc cggcggtctt cgccgcccgt acccccgtcc tggtcttccc cgggcagggg 28920 

tcgcagtggg tcggcatggc ccgtgacctg ctcgactcct ccgaggtgtt cgccgagtcg 28980 

atgggccggt gcgccgaggc gctgtcgccg tacaccgact gggacctgct cgacgtggtc 29040 

cgtggggtcg gcgaccccga cccgtacgac cgggtggacg tgctccagcc ggtgctgttc 29100 

gcggtgatgg tgtcgctggc gcggttgtgg cagtcgtacg gggtgactcc gggtgcggtg 29160 

gtgggtcact cgcaggggga gatcgccgcc gcgcacgtgg ctggtgcgtt gtcgttggcc 29220 

gacgccgcca gggtggtggc gttgcgcagc cgggtgctgc gggagctcga cgaccagggc 29280 

ggcatggtgt cggtcggcac ctcccgcgcc gagttggact cggtcctgcg ccggtgggac 2£340- 

gggcgggtcg cggtggcggc ggtgaacgga cccggcacgc tcgtggtggc cggacccacc 29400 
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gccgaactgg acgagttcct cgcggtggcc gaggcccgcg agatgaggcc gcgtcggatc 294 60 

gcggtgcgct acgcgtcgca ctccccggag gtggcccggg tcgaacagcg gctcgccgcc 29520 

gaactcggca ccgtcaccgc cgtcggcggc acggtcccgc tctactccac cgccaccggg 29580 

gacctcctcg acaccacagc catggacgcc gggtactggt accgcaacct gcgccaaccg 29640 

gtgctgttcg agcacgccgt ccgcagcctc ctggagcggg gattcgagac gttcatcgag 29700 

gtcagcccgc accctgtgct gctgatggcg gtcgaggaga ccgccgagga cgccgagcgc 29760 

ccggtcaccg gcgtgccgac gctgcgccgc gaccacgacg ggccgtcgga gttcctccgc 29820 

aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc tgcgtccggc ggtcgcccac 29880 

ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc agcggctctg gcccaagccg 29940 

caccgcaggg ccgacacctc gtcgctgggg gtccgtgact cgacccaccc gctgctgcac 30000 

gccgcagtcg acgtacccgg tcacggcgga gcggtgttca ccgggcggct ctcccccgac 30060 

gagcagcagt ggctgaccca gcacgtggtg ggtgggcgga acctggtgcc cggcagtgtc 30120 

ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg tgccggtgct ggaggaactc 30180 

gtcctgcagc agccgctggt gttgaccgcc gccggtgcgt tgctgcgcct gtcggtcggc 30240 

gccgccgacg aggacgggcg gcggccggtc gagatccacg ccgccgagga cgtctccgac 30300 

ccggccgagg cccggtggtc ggcgtacgcg accgggaccc tcgccgtcgg cgtggccggc 30360 

ggcggccggg acggcacaca gtggcccccg cccggcgcca ccgccctgac gttgaccgac 30420 

cactacgaca ccctcgccga actgggctac gagtacgggc cggcgttcca ggcgctgcgc 30480 

gccgcgtggc agcacggcga cgtggtctac gcggaggtgt ccctcgacgc cgtcgaggag 30540 

gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc agaccttcgg cctgaccagt 30600 

cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca ccctgcacgc caccggggcc 30660 

actgcggtac gggtggtggc gacccccgcc ggaccggacg cggtggccct gcgggtcacc 30720 

gacccgaccg gtcagctcgt cgccacggtg gacgccctgg tcgtcaggga cgccggggcg 30780 

gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc gcctggagtg ggtacggctg 30840 

gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg cggccgacgg gctcgacgac 30900 

ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg tccgctaccg tcccgacggc 30960 

gacgacccga cggccgaggc ccgtcacggg gtgctctggg cggccacgct cgtgcgccgt 31020 

tggctcgacg acgaccggtg gcccgccacc accctggtgg tggccacgtc cgcaggggtc 31080 

gaggtctccc ccggggacga cgtgccgcgc cccggggccg ccgccgtgtg gggggtgctg 31140 

cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg tcgacggcga cccggagacg 31200 

cccccggcgg tgccggacaa tccgcagctc gcggtccgtg acggtgcggt gttcgtgcca 31260 

cggctgacgc "cgctcgccgg tcccgtgccg gccgtcgccg accgggcgta ccggctggtg 31320 

cccggcaacg gcggctccat cgaggcagtg gccttcgccc ccgtccccga cgccgaccgg 31380 

cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca ccggcgtgaa cttccgtgac 31440 

gtcctgctcg cgctcggcat gtacccggaa ccggccgaga tgggcaccga ggcgtccggt 31500 

gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc ccggccaggc ggtgacgggc 31560 

ctgttccagg gggccttcgg gccggtggcg gtcgccgacc accggctcct caccccggtc 31620 

cccgacgggt ggcgggcggt ggacgccgca gccgtaccca tcgcgttcac caccgcccac 31680 

tacgcgctgc acgacctggc cgggttgcag gccgggcagt ccgtgctggt ccacgccgcc 31740 

gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc gggccggggc ggaggtgttc 31800 

gccacggcca gcccggccaa acacccgacg ctgcgggcgc tcggcctcga cgacgaccac 31860 

atcgcctcgt cccgggagag cgggttcggt gagcggttcg ccgcgcgtac cggggggcgg 31920 

ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc tcgacgagtc cgcgcggctg 31980 

ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg acctgcggcc ggcggagcag 32040 

ttccggggcc ggtacgtccc gttcgacctg gccgaggccg gtcccgatcg gctcggcgag 32100 

atcctggagg aggtcgtcgg tctgctggcc gccggtgccc tcgaccggtt gccggtgtcg 32160 

gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca tgagccgggg ccgacacgtg 32220 

ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg acggaacggt gctggtcacc 32280 

ggcgggaccg gcaccctggg gcggctggtc gcccgccacc tggtgaccgg gcacggcgta 32340 

ccccacctcc tggtggccag ccggcgcggt ccggcggccc cgggcgcggc cgagctgcgc 32400 

gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg cctgcgacac cgccgaccgg 324 60 

gaggcgctcg cggcgctgct cgactcgatc cccgcggacc gtccgctgac cggggtggtg 32520 

cacaccgccg gggtcctggc cgacgggctg gtcacctcca tcgacgggac cgccaccgat 32580 

caggtcctgc gggccaaggt cgacgcggcg tggcacctgc acgacctgac ccgggacgcg 32640 

gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg tgctggccgg tcccgggcag 32700 

ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg ccgggcaacg gcgggccctc 32760 

ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc aggccagcga gatgaccagc 32820 

ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc tgccgaccga gcgggcgctg 32880 

gccctgttcg acgcggctct gcgcagcggc ggggaggtgc tgttcccgct gtctgtcgac 32940 

aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc tgcgcggcgc ggtccggtcc 33000- 

acgccacggg ccgccaacag ggccgagacc ccgggccggg gcctgctcga ccgtctcgtc 33060 
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ggtgcacccg agaccgatca ggtggccgcg ctggccgagc tggtccgctc gcacgcggcg 33120 

gcggtcgccg gctacgactc ggccgaccag ctgcccgaac gcaaggcgtt caaggacctc 33180 

gggttcgact cgctggcggc ggtggagctg cgcaaccggc tcggcgtcac caccggcgta 3324 0 

cggctgccca gcacgctggt gttcgaccac ccgacaccgc tggcggtggc cgaacacctg 33300 

cggtcggagt tgttcgccga ctccgcgccg gacgtcgggg tcggtgcgcg cctcgacgac 33360 

ctggaacggg cgctcgacgc cctgcccgac gcgcagggac acgccgacgt cggggcccgc 33420 

ctggaggcgc tgctgcgccg gtggcagagc cgacgacccc cggagaccga gccagtgacg 33480 

atcagtgacg acgccagtga pgacgagctg ttctcgatgc tcgacaggcg tctcggcggg 33540 

ggaggggacg tctaggtgac aggtcgattc cgccccgcgg cagtggaccg taccgccctg 33600 

acaggtccac cgggttcgcg tcgcctccca cacccgacgg ccggggtatc cacggaaggg 33660 

atccgatgag cgagagcagc ggcatgaccg aggaccgcct ccggcgctat ctcaagcgca 33720 

ccgtcgccga actcgactcg gtgacaggtc ggctcgacga ggtcgagtac cgggcccgcg 33780 

aaccgatcgc cgtcgtcggc atggcctgcc ggttccccgg gggtgtggac tcgccggagg 33840 

cgttctggga gttcatccgc gacggtggtg acgcgatcgc cgaggcgccc acggaccgtg 33900 

gctggccgcc ggcaccgcga ccccgcctcg gtggtctcct cgcggagccg ggcgcgttcg 33960 

acgccgcctt cttcggcatc tcaccccgcg aggcgctcgc gacggacccc cagcagcgcc 34020 

tgatgctgga gatctcctgg gaggcgttgg agcgtgcggg tttcgacccg tcgagcctgc 34080 

gcggcagcgc cggtggcgtc ttcaccggtg tcggtgcggt ggactacgga cccaggccgg 34140 

acgaggcacc cgaggaggtg ctcggctacg tcggcatcgg caccgcctcc agcgtcgcct 34200 

ccggacgggt ggcgtacacc ctggggttgg agggtccagc cgtcaccgtc gacaccgcct 34260 

gctcctccgg gctcaccgcg gtgcacctgg cgatggagtc gctgcgccgc gacgagtgca 34320 

ccctggtcct cgccggtggg gtcaccgtga tgagcagccc gggtgcgttc accgagttcc 34380 

gcagccaggg cgggttggcc gaggacggcc gctgcaaacc gttctcccgc gccgccgacg 34440 

gcttcgggct ; *cgccgagggg gccggggtcc tggtgctcca acggctgtcc gtcgcccggg 34500 

ccgagggccg gccggtgctg gccgtactgc gtggctcggc gatcaaccag gacggtgcca 34560 

gcaacgggct caccgcgccg agcggccccg cccagcggcg ggtgatcagg caggcgttgg 34 620 

agcgggcgcg gctgcgtccc gtcgacgtgg actacgtgga ggcccacggc accggcaccc 34680 

ggctgggcga tccgatcgag gcgcacgccc tgctcgacac gtacggtgcc gaccgggaac 34740 

ccggccgccc gctctgggtc ggatcggtga agtccaacat cggtcacacc caggcggcgg 34800 

cgggggtggc cggggtgatg aagaccgtgc tggcgctgcg gcatcgggag atcccggcga 34860 

cgttgcactt cgacgagccc tcgccgcacg tcgactggga ccggggtgcg gtgtcggtgg 34 920 

tgtccgagac ccggccctgg ccggtggggg agcgcccgcg ccgggcgggg gtgtcctcgt 34980 

tcggcatcag cggcaccaac gcgcacgtca tcgtcgagga ggcgccgagc ccgcaggcgg 35040 

ccgacctcga cccgaccccc ggcccggcaa ccggagcgac ccccggaacg gatgccgccc 35100 

ccaccgccga gccgggtgcg gaggcggtcg cactggtgtt ctccgcgcgc gacgagcggg 35160 

ccctgcgcgc ccaggcggcc cggctcgccg accgtctcac cgacgacccg gccccctcgt 35220 

tgcgcgacac cgccttcacc ctggtcaccc gccgtgccac ctgggagcat cgggcggtcg 35280 

tcgtcggcgg gggcgaggag gtcctcgccg gcctccgggc cgtcgccggg ggacgtcccg 35340 

tcgacggagc cgtcagcggg cgggcgcgcg ccggccgccg ggtggtgctg gtcttccccg 35400 

ggcagggcgc acagtggcag ggcatggccc gggacctgct gcggcagtcg ccgaccttcg 354 60 

cggagtccat cgacgcctgc gagcgggcgc tcgccccgca cgtggactgg tcgctgcgcg 35520 

aggtgctcga cggcgagcag tcgttggacc ccgtcgacgt ggtgcagccg gtgctgttcg 35580 

cggtgatggt gtcgttggcg cggttgtggc agtcgtacgg ggtgactccg ggtgcggtgg 35640 

tgggtcactc gcagggggag atcgccgccg cgcacgtggc tggtgcgttg tcgttggccg 35700 

acgccgccag ggtggtggcg ttgcgcagcc gggtgctgcg ccgtctcggt ggtcacggcg 35760 

ggatggcgtc gttcgggctc caccccgacc aggccgccga gcggatcgcg cgcttcgcgg 35820 

gtgcgctgac tgtcgcctcg gtcaacggtc cccgttcggt ggtgctggcc ggggagaacg 35880 

gcccgttgga cgagctgatc gccgagtgcg aggccgaggg cgtgaccgcc cgtcggatcc 35940 

ccgtcgacta cgcctcacac tccccgcagg tggagtcgct gcgtgaggag ctgctcgccg 36000 

cactggccgg ggtccgtccg gtgtcggccg ggatccccct gtactcgacc ctgaccggtc 36060 

aggtcatcga aacggcgacg atggacgccg actactggtt cgccaacctc cgggagccgg 36120 

tgcgcttcca ggacgccacc aggcagctcg ccgaggcggg gttcgacgcc ttcgtcgagg 36180 

tcagcccgca cccggtgttg acagtcggtg tcgaggccac cctcgaggca gtgctgcccc 36240 

ccgacgcgga tccgtgtgtc acaggcaccc tgcgccgcga acgcggcggt ctcgcgcagt 36300 

tccacaccgc gctcgccgag gcgtacaccc ggggggtgga ggtcgactgg cgtaccgcag 36360 

tgggtgaggg acgcccggtc gacctgccgg tctacccgtt ccaacgacag aacttctggc 36420 

tcccggtccc cctgggccgg gtccccgaca ccggcgacga gtggcgttac cagctcgcct 36480 

ggcaccccgt cgacctcggg cggtcctccc tggccggacg ggtcctggtg gtgaccggag 36540 

cggcagtacc cccggcctgg acggacgtgg tccgcgacgg cctggaacag cgcggggcga 36600 

ccgtcgtgtt gtgcaccgcg cagtcgcgcg cccggatcgg cgccgcactc gacgccgtcg 3€660* 

acggcaccgc cctgtccact gtggtctctc tgctcgcgct cgccgagggc ggtgctgtcg 36720 
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acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 36780 

tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 36840 

cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 36900 

ggggtggcct ggtggacctg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 36960 

tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 37020 

cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 37080 

tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 37140 

cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 37200 

acctgcgtga cgaactggtc gcgctcggca cgggagtcac catcacggcc tgcgacgtcg 37260 

ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga cgggtggtca 37320 

cggcggtgtt ccacgccgcc gggatctccc ggtccacagc ggtacaggag ctgaccgaga 37380 

gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 37440 

gtcccgagct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 37500 

ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 37560 

gcagtgggct gccggtcacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 37620 

gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 37680 

cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 37740 

tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg 37800 

aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 37860 

ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 37920 

aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 37980 

gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 38040 

ccggggtccg ggtggccacg accatcgtct tcgaccaccc gacagtggac cgcctcaccg 38100 

cgcactacct ggaacgact'c gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 38160 

tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 38220 

tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 38280 

cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 38340 

ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 38400 

tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 384 60 

ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 38520 

tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 38580 

cgcaggtgcc 'gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 38640 

ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 38700 

gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 38760 

ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 38820 

ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 38880 

ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 38940 

gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 39000 

gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 39060 

gtcgtgcggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc -39120 

ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 39180 

gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 39240 

cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 39300 

tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg ttggtggtgg 39360 

cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 39420 

cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 394 80 

tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 39540 

tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 39600 

tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 39660 

gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 39720 

acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 39780 

cgggtggtgg tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 39840 

gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 39900 

cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 39960 

tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 40020 

tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 40080 

cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 40140 

gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 40200 

ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 40260 

actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc 40320- 

actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 40380 
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cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 404 40 

gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 40500 

acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 40560 

ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 40620 

ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 4 0680 

gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 40740 

ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 40800 

cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 40860 

actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 40920 

acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 40980 

ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 41040 

gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 41100 

aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 41160 

acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 41220 

agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 41280 

gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 41340 

acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 41400 

ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 414 60 

cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 41520 

acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 41580 

tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 41640 

aacaggaccg gccggtccgg atcgtcgt cc acaccgcagg ggtgcccgac tcccgtcccc 41700 

tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 41760 

tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 41820 

ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 41880 

accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 41940 

acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 42000 

cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 42060 

tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 42120 

tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 42180 

tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 4 2240 

tcggtcacca 'ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 42300 

actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 42360 

ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 42420 

agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 424 80 

ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 42540 

agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 42600 

tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 42660 

ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 42720 

tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 42780 

gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 4284 0 

actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 42900 

cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 42960 

cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 43020 

cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 43080 

ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 43140 

gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 4 3200 

cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 43260 

agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 43320 

actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 43380 

cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 43440 

cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg' acccgacctt 43500 

cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 43560 

ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 43620 

gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 43680 

ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 43740 

gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 43800 

ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 43860 

caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 43920 

ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 43980' 

ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 44040 
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ggtgggcgag 
ccgtgacgcg 
ccgggccctg 
gaccaccgcc 
ggtcgtcagg 
ctgaggtgcc 
gtctcgttcc 
caccggctct 
acgtcgacct 
gcctggactt 
agaccgtcct 
tgatctcctt 
cgtcgatcgc 
tcacggtacg 
gggaggaccc 
cgcaggacgt 
tgcgcctcga 
cggtggtgcc 
gcatctccag 
tcggtgacgt 
cccacgtccc 
cctgcgcggc 
gcgtgccgca 
aggaccaggg 
aggcggtgcg 
ccgacatgct 
gggaacggac 
cctgctgacc 
cgcagtacgc 
cgacgactct 
cggggacacc 
cgggcaggtc 
cctgcggcgt 
ggagccgtgg 
ttacgtcggg 
ccgccgtcac 
tcccgaactg 
gacccgcctc 
gggacagccg 
cggcgagcac 
ggcggtcgtc 
cgccctcggc 
aggggcggcc 
gtcggtgcgg 
cagcaggtcc 
cccgcccggg 
gtcctccacc 
cggcggcccg 
cagcgacatg 
catctgctcg 
ggcggtctgc 
acgggcggtc 
ctcgccgacg 
gagagcccgc 
ctccggggcc 
gaagcagtac 
catcggtgtc 
accgccgcca 
acagggcccc 
ggatgccgtg 
ggtcggggaa 



cacacggtcg 
ggggtcttcg 
tccgcccagc 
gcactgcgca 
cgacgtcgtt 
tgcgatgcgc 
cctcgcctgg 
caccgacgac 
tgtcgacttc 
cagcgagcgg 
caccccgacc 
ctgtcggtcg 
ggcgacggtg 
ggcccggcag 
cctcgccgag 
cgaggagctg 
caccgggctg 
ggactggctg 
ccgggagaac 
cgacgccgag 
ggccaacatc 
gacggtgcac 
ggtgatcctg 
ggcgggcatc 
gcgggtcctg 
cgccgagccg 
cgccgtcgga 
agccggctct 
ctgctcgacc 
gcgtcgacca 
ggtcggcggg 
ggcgggggcg 
ctcggtgtcg 
gacgaggtct 
tcgtcgggct 
cgcctcggcc 
gaggtcctgc 
ggcggtctgc 
acggcactgc 
cccgccgagg 
ggtgcgcgga 
gcgacggaac 
ccggaggcgt 
cgggacggcc 
ggcgccaccg 
aaggtccggg 
gtcgtcaacg 
gcctgccagg 
tccaacagcc 
acgagtccgt 
ccggagacga 
tcgtaggcga 
aggtcgcgca 
tcgtcacgtc 
agtgcccggg 
agccgggccc 
tcatccgttc 
gacaacctcg 
cagggcgacc 
cccgaaggtg 
ctgtcccggg 



cggcgggcga 
ccgacccgga 
gcggtcaccc 
gcgtcgccaa 
caccggtcct 
gtcgtcttct 
gccttccgcg 
atcacggcgg 
atgacccacg 
gacccggcca 
ttctacgccc 
tggcgacccg 
accggcgtgg 
aagttcctcg 
tggctcacct 
gtggtcgggc 
aggacggtgg 
cacgacgagc 
agcatcgggc 
atcatcgcga 
cgtacggtcg 
cacggcggtc 
cccgacggct 
gccctgccgg 
gacgatcccg 
tcccccgccg 
tgagcaccga 
ggctgggtac 
acgcccgttc 
gtgcccaggt 
aggagaccgt 
gcctctccgc 
accacgtcga 
ggcaggcggt 
tccccggatg 
tggtgtccca 
ccgccgcgca 
tcggcggcga 
gctcggcggt 
tcgcactggc 
cgcccggacg 
tcaccgccct 
ggctacggtg 
gccgcggtcc 
actcggccac 
tacggccggg 
ggtcggtgtc 
cgtaggagcg 
cctggtcggc 
cctcgtcggg 
acaaccgcag 
ccagggcgcc 
gcacggccgc 
ggtcctgccg 
cgaggtcgag 
ggtgtccgtc 
ggtcgcaccg 
tcggagggga 
tctccctcca 
aggtgtcccc 
tcgcggttgg 



cgaggtcgtc 
ccgcctcgac 
cggccggttg 
ggcgctgccc 
gcgagccacc 
cctccatggc 
cggcgggcca 
ccggactgac 
ccgggtacga 
cctccacctg 
tgatgagccc 
actggtcgtc 
cccacgcccg 
ggctgctgcc 
ggtctgtgga 
agtggacgat 
gcatgcgcta 
cgacccgccg 
aggtctccgt 
cagtggacga 
ggttcgtccc 
ccggcagctg 
gggacaccgg 
tgcccgagct 
ccttcaccgc 
aggtcgtcga 
cgccacccac 
ggcagccctc 
ccggggcgtc 
cgccgaggag 
cctgtcggtg 
ccggcagatc 
cgtccttcac 
ggacgccctc 
gcacatcgtc 
ccagtgtcgg 
ggcgtacggg 
cggtccgggc 
gga.ggcgtac 
gtgggtgctg 
gctcgactcc 
ggacgggatc 
agagcccgcc 
ccgccccggt 
ctccccgacg 
gactaccgag 
accgcagagg 
cagcacccgg 
caatgcggcc 
caggtcggtg 
cggtcgcacc 
catgctgtga 
gacctcgtcg 
gcccgggtac 
gtacgagtcg 
ggcggacccg 
gcaggtggtc 
agcccagcga 
gcttggccag 
ggctgtccct 
ccgccccgtt 



gtggtggtcg 
ccggaccggg 
gaggagctgg 
ggtctcaccg 
gcccactgcc 
cagcaagagc 
cgaggtacgg 
ggccgtaccg 
catcatcgac 
ggaccacctg 
ggactcgctg 
tggaccgcag 
actcctgtgg 
cggacagccc 
gaggttcggc 
cgaccccgcc 
cgtcgactac 
acgggtctgc 
cgacgacctg 
gcagcagctc 
gatgcacgca 
gcacaccgcc 
ggtccgcgcc 
gacctccgac 
cggtgcggcg 
cgtctgtgcg 
gtccggctcg 
gccggccagg 
aactgcctcg 
tcggtcggcc 
acggtgggtg 
atcgcctcct 
ctgccccggg 
gtggccgccg 
gccgcccagg 
tacgacctga 
ctcggggtct 
gccgcagccg 
gaggtgttct 
tcccggcccg 
gcgctccgcg 
ttccccgggg 
cctgacctgc 
cagccggtgg 
tggtcggcga 
tacggcagcc 
gtggtgatgc 
tggtcggccc 
tcgctgaccc 
cgccgctcgt 
cccggacgag 
ccgaacaggg 
gcgatctccc 
tgcaccgccc 
gcggcggctc 
aaccgccgca 
gatgccgcgc 
cagcttcggg 
cgggcggccc 
ggtgacgtcg 
ggcgatcagg 



ccgccgccaa 
ccgacgccga 
tggtggtcct 
ccggtggccc 
cggtcgaact 
cacctgttcg 
gtcgtcgcct 
gtcggcaccg 
tacgtccgca 
ctcggcatgc 
gtcgagggca 
accttcgccg 
ggacccgaca 
gccgcccacc 
ggccgggtgc 
ccggtcggga 
aacggcccgt 
ctcaccctgg 
ttgggtgcgc 
gaaggcgtcg 
ctgctgccga 
gccatccacg 
cagcggaccg 
cagctccgcg 
cggatgcggg 
gggctggtcg 
gccggtgcgc 
acgacgccga 
acaccgccga 
ggtggttggc 
tcccaccggg 
gtgagggctc 
tggaccgggt 
gaaaggtctg 
agcacgccgt 
cgtcgcgcca 
tcgccaggcc 
cacgggcgtc 
gcagagacct 
gtgtggcggg 
cctgcggcgt 
tcgccgcagc 
gggaacccgt 
gggtgagccg 
ggtagaagtg 
agcgttgggc 
cggcccgcag 
gcagcaccgg 
cgagcctgcg 
ggacccgggg 
cctccaggcg 
cgaacggaac 
cggcggtgcc 
acacgtcgac 
ccgcgtgcgg 
accaggtgtt 
agcaggagcg 
aagcggtcga 
atgcagtagt 
aaccggtcgg 
acggtgctgt 
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47280 

47340 

47400 

47460 

47520 

47580 

47r640 - 

47700 
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acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 47760 

ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 47820 

tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 47880 

ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 47940 

cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 47981 

<210> 2 
<211> 48 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 2 

Met Gly Asp Arg Val Asn Gly His Ala Thr Pro Glu Ser Thr Gin Ser 

1 " * 5 10 15 

Ala lie Arg Phe Leu Thr Arg His Gly Gly Pro Pro Thr Ala Thr Asp 

20 25 30 

Asp Val His Asp Trp Leu Ala His Arg Ala Ala Glu His Arg Leu Glu 
35 40 45 

<210> 3 
<211> 377 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 3 



Met 


Ala 


Val 


Glv 


Asp 


Arq 


Arg 


Arg 


Leu 


Gly 


Arg Glu Leu Gin Met Ala 


1 








5 










10 


15 


Aro 


Gly 


Leu 


Tyr 


Trp 


Gly 


Phe 


Gly 


Ala 


Asn 


Gly Asp Leu Tyr Ser Met 








20 










25 




30 


Leu 


Leu 


Ser 


Gly 


Arg 


Asp 


Asp 


Asp 


Pro 


Trp 


Thr Trp Tyr Glu Arg Leu 






35 










40 






45 


Arg 


Ala 


Ala 


Gly 


Arg 


Gly 


Pro 


Tyr 


Ala 


Ser 


Arg Ala Gly Thr Trp Val 


50 










55 








60 


Val 


Gly 


Asp 


His 


Arg 


Thr 


Ala 


Ala 


Glu 


Val 


Leu Ala Asp Pro Gly Phe 


65 










70 










75 80 


Thr 


His 


Gly 


Pro 


Pro 


Asp 


Ala 


Ala 


Arg 


Trp 


Met Gin Val Ala His Cys 










85 










90 


95 


Pro 


Ala 


Ala 


Ser 


Trp 


Ala 


Gly 


Pro 


Phe 


Arg 


Glu Phe Tyr Ala Arg Thr 








100 










105 




110 


Glu 


Asp 


Ala 


Ala 


Ser 


Val 


Thr 


Val 


Asp 


Ala 


Asp Trp Leu Gin Gin Arg 




115 










120 






125 


Cys 


Ala 


Arg 


Leu 


Val 


Thr 


Glu 


Leu 


Gly 


Ser 


Arg Phe Asp Leu Val Asn 


130 










135 








140 


Asp 


Phe 


Ala 


Arg 


Glu 


Val 


Pro 


Val 


Leu 


Ala 


Leu Gly Thr Ala Pro Ala 


145 










150 










155 160 


Leu 


Lys 


Gly 


Val 


Asp 


Pro 


Asp 


Arg 


Leu 


Arg 


Ser Trp Thr Ser Ala Thr 










165 










170 


175 


Arg 


Val 


Cys 


Leu 


Asp 


Ala 


Gin 


Val 


Ser 


Pro 


Gin Gin Leu Ala Val Thr 






180 










185 




190 


Glu 


Gin 


Ala 


Leu 


Thr 


Ala 


Leu 


Asp 


Glu 


He 


Asp Ala Val Thr Gly Gly 






195 










200 






205 


Arg 


Asp 


Ala 


Ala 


Val 


Leu 


Val 


Gly 


Val 


Val 


Ala Glu Leu Ala Ala Asn 


210 










215 








220 


Thr 


Val 


Gly 


Asn 


Ala 


Val 


Leu 


Ala 


Val 


Thr 


Glu Leu Pro Glu Leu Ala 


225 










230 










235 240 


Ala 


Arg 


Leu 


Ala 


Asp 


Asp 


Pro 


Glu 


Thr 


Ala 


Thr Arg Val Val Thr Glu 










245 










250 


255 


Val 


Ser 


Arg 


Thr 


Ser 


Pro 


Gly 


Val 


His 


Leu 


Glu Arg Arg Thr Ala Ala 








260 










265 




270 


Ser 


Asp- Arg 


Arg 


Val 


Gly 


Gly 


Val 


Asp 


Val 


Pro Thr Gly Gly Glu Val 






275 










280 






285 
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Thr Val 


Val 


Val 


Ala 


Ala Ala 


Asn 


Arg 


290 








295 






Pro Asp 


Arg 


Phe 


Asp Val Asp 


Arg 


Gly 


305 








310 






Ser Arg 


Pro 


Gly 


Ser 


Pro Arg 


Thr 


Asp 






325 








Leu Ala 


Thr 


Ala 


Ala 


Leu Arg 


Ala 


Ala 






340 








345 


Ser Arg 


Ser 


Gly 


Pro 


val lie 


Arg 


Arg 




355 








360 




Gly Leu 


Ser 


Arg 


Cys 


Pro Val 


Glu 


Leu 


370 








375 







<210> 4 
<211> 436 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 4 



Met 


Arg 


Val 


Val- 


Phe 


Ser 


Ser 


Met 


Ala 


1 






5 










Leu 


Val 


Pro 


Leu 


Ala 


Ser 


Ala 


Phe 


Gin 








20 










25 


Val 


Val 


Ala 


Ser 


Pro 


Ala 


Leu 


Thr 


Asp 






35 










40 




Thr 


Ala 


Val 


Pro 


Val 


Gly 


Asp 


Asp 


Val 




50 










55 






His 


Ala 


Gly 


Gin 


Asp 


He 


Val 


Glu 


Tyr 


65 










70 








Asp 


Gin 


Ser 


His 


Thr 


Thr 


Met 


Ser 


Trp 








85 










Thr 


Thr 


Phe* 


Thr 


Pro 


Thr 


Phe 


Phe 


Ala 








100 










105 


He 


Asp 


Gly 


Met 


Val 


Glu 


Phe 


Cys 


Arg 






115 










120 




Val 


Trp 


Glu 


Pro 


Leu 


Thr 


Phe 


Ala 


Ala 




130 










135 






Gly 


Thr 


Pro 


His 


Ala 


Arg 


Met 


Leu 


Trp 


145 










150 








Ala 


Arg 


Gin 


Ser 


Phe 


Leu 


Arg 


Leu 


Leu 








165 










Arg 


Glu 


Asp 


Pro 


Leu 


Ala 


Glu 


Trp 


Phe 








180 










185 


Gly 


Asp 


Asp 


Pro 


His 


Leu 


Ser 


Phe 


Asp 






195 










200 




Trp 


Thr 


Val 


Asp 


Pro 


He 


Pro 


Glu 


Pro 




210 










215 






Arg 


Thr 


Val 


Gly 


Met 


Arg 


Tyr 


Val 


Pro 


225 










230 








Pro 


Ala 


Trp 


Leu 


Leu 


Arg 


Glu 


Pro 


Glu 










245 










Leu 


Gly 


Gly 


Ser 


Ser 


Arg 


Glu 


His 


Gly 








260 










265 


Glu 


Met 


Leu 


Asp 


Ala 


He 


Ala 


Asp 


He 






275 










280 




Phe 


Asp 


Asp 


Gin 


Gin 


Leu 


Val 


Gly 


Val 




290 










295 






Arg 


Thr 


Ala 


Gly 


Phe 


Val 


Pro 


Met 


Asn 


305 










310 








Ala 


Thr- 


Val 


His 


His 


Gly 


Gly 


Thr 


Gly 



325 
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Asp 


Pro 


Glu 


Val 


Phe Thr Asp 






300 




* 


Gly 


Asp 


Ala 


Glu 


He Leu Ser 




315 






320 


Leu 


Asp 


Ala 


Leu 


Val Ala Thr 


330 








335 


Ala 


Pro 


Val 


Leu 


Pro Arg Leu 










OCA 


Arg 


Arg 


Ser 


Pro 


Val Ala Arg 








365 




Val 


Asn 


Ser 


His 


Leu Phe Gly 


10 








15 


Ala 


Ala 


Gly 


His 


Glu Val Arg 










30 


Asp 


Val 


Thr 


Gly 


Ala Gly Leu 








45 




Glu 


Leu 


Val 


Glu 


Trp His Ala 






60 






Met 


Arg 


Thr 


Leu 


Asp Trp Val 




75 






80 


Asp 


Asp 


Leu 


Leu 


Gly Met Gin 


90 








95 


Leu 


Met 


Ser 


Pro 


Asp Ser Leu 










110 


Ser 


Trp 


Arg 


Pro 


Asp Trp He 








125 




Pro 


He 


Ala 


Ala 


Arg Val Thr 






140 






Gly 


Pro 


Asp 


Val 


Ala Thr Arg 




155 






160 


Ala 


His 


Gin 


Glu 


Val Glu His 


170 








175 


Asp 


Trp 


Thr 


Leu 


Arg Arg Phe 










190 


Glu 


Glu 


Leu 


Val 


Leu Gly Gin 








205 




Leu 


Arg 


He 


Asp 


Thr Gly Val 






220 






Tyr 


Asn 


Gly 


Pro 


Ser Val Val 




235 






240 


Arg 


Arg 


Arg 


Val 


Cys Leu Thr 


250 








255 


He 


Gly 


Gin 


Val 


Ser He Gly 










270 


Asp 


Ala 


Glu 


Phe 


Val Ala Thr 








285 




Gly 


Ser 


Val 


Pro 


Ala Asn Val 






300 






Val 


Leu 


Leu 


Pro 


Thr Cys Ala 




315 






320 


Ser 


Trp 


Leu 


Thr 


Ala Ala He 



330 335 
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His 


Gly 


Val 


Pro 


Gin 


He 


He 


Leu 


Ser Asp Ala 


Asp Thr Glu Val 


His 






340 










345 


350 




Ala 


Lys 


Gin 


Leu 


Gin 


Asp 


Leu 


Gly 


Ala Gly Leu 


Ser Leu Pro Val 


Ala 




355 










360 




365 




Gly 


Met 


Thr 


Ala 


Glu 


His 


Leu 


Arg 


Gly Ala He 


Glu Arg Val Leu 


Asp 


370 










375 






380 




Glu 


Pro 


Ala 


Tyr 


Arg 


Leu 


Gly 


Ala 


Glu Arg Met 


Arg Asp Gly Met 


Arg 


385 










390 






395 




400 


Thr 


Asp 


Pro 


Ser 


Pro 


Ala 


Gin 


Val 


Val Gly He 


Cys Gin Asp Leu 


Ala 








405 








410 


• 415 




Ala 


Asp 


Arg 


Ala 
420 


Ala 


Arg 


Gly 


Arg 


Gin Pro Arg 
425 


Arg Thr Ala Glu 
430 


Pro 


His 


Leu 


Pro 
435 


Arg 

















<210> 5 
<211> 390 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 5 



Met 


Val 


Thr 


Ser 


Thr 


Asn 


Leu 


Asd 


Thr Thr Ala 


Arg Pro Ala 


Leu 


Asn 


1 








5 








10 


15 




Ser 


Leu 


Thr 


Glv 


Met 


Arcr 


Phe 


Val 


Ala Ala Phe 


Leu Val Phe 


Phe 


Thr 








20 










25 


30 






His 


Val 


Leu 


Ser 


Arq 


Leu 


He 


Pro 


Asn Ser Tyr 


Val Tyr Ala Asp 


Gly 






35 










40 




45 






Leu 


Asd 


Ala 


Phe 


Trp 


Gin 


Thr 


Thr 


Gly Arg Val 


Gly Val Ser Phe 


Phe 




50 










55 






60 






Phe 


He 


Leu 


Ser 


Gly 


Phe 


Val 


Leu 


Thr Trp Ser 


Ala Arg Ala 


Ser 


Asp 


65 










70 






75 






80 


Ser 


Val 


Trp' 


Ser 


Phe 


Trp 


Arg 


Arg 


Arg Val Cys 


Lys Leu Phe 


Pro 


Asn 










85 








90 




95 




His 


Leu 


Val 


Thr 


Ala 


Phe 


Ala 


Ala 


Val Val Leu 


Phe Leu Val 


Thr 


Gly 








100 










105 


110 






Gin 


Ala 


Val 


Ser 


Gly 


Glu 


Ala 


Leu 


He Pro Asn 


Leu Leu Leu 


He 


His 






115 










120 




125 






Ala 


Trp 


Phe 


Pro 


Ala 


Leu 


Glu 


He 


Ser Phe Gly 


He Asn Pro 


Val 


Ser 




130 










135 






140 






Trp 


Ser 


Leu 


Ala 


Cys 


Glu 


Ala 


Phe 


Phe Tyr Leu 


Cys Phe Pro 


Leu 


Phe 


145 










150 






155 






160 


Leu 


Phe 


Trp 


He 


Ser 


Gly 


He 


Arg 


Pro Glu Arg 


Leu Trp Ala 


Trp 


Ala 










165 








170 




175 




Ala 


Val 


Val 


Phe 


Ala 


Ala 


He 


Trp 


Ala Val Pro 


Val Val Ala Asp 


Leu 








180 










185 


190 






Leu 


Leu 


Pro 


Ser 


Ser 


Pro 


Pro 


Leu 


He Pro Gly 


Leu Glu Tyr 


Ser 


Ala 






195 










200 




205 






He 


Gin 


Asp 


Trp 


Phe 


Leu 


Tyr 


Thr 


Phe Pro Ala 


Thr Arg Ser 


Leu 


Glu 




210 










215 






220 






Phe 


He 


Leu 


Gly 


He 


He 


Leu 


Ala 


Arg He Leu 


He Thr Gly Arg 


Trp 


225 










230 






235 






240 


He 


Asn 


Val 


Gly 


Leu 


Leu 


Pro 


Ala 


Val Leu Leu 


Phe Pro Val 


Phe 


Phe 










245 








250 




255 




Val 


Ala 


Ser 


Leu 


Phe 


Leu 


Pro 


Gly 


Val Tyr Ala 


He Ser Ser 


Ser 


Met 








260 










265 


270 






Met 


He 


Leu 


Pro 


Leu 


Val 


Leu 


He 


He Ala Ser 


Gly Ala Thr Ala 


Asp 






275 










280 




285 






Leu 


Gin 


Gin 


Lys 


Arg 


Thr 


Phe 


Met 


Arg Asn Arg 


Val Met Val 


Trp 


Leu 




290 










295 






300 






Gly 


Asp- 


Val 


Ser 


Phe 


Ala 


Leu 


Tyr 


Met Val His 


Phe Leu Val 


He 


Val 


305 










310 






315 






320 
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Tvr Gly 


Ala Asp 


Leu Leu 


Gly Phe Ser 






325 




Gly Leu 


Ala Leu 


Phe Met 


He He Pro 


340 




345 


Leu Ser 


Trp Leu 


Leu Tyr 


Arg Phe Val 




355 




360 


Trp Ala 


Arg Pro 


Ala Ser 


Ala Arg Arg 


370 






375 


Gin Thr 


Pro Ser 


Arg Arg 




385 




390 





<210> 6 
<211> 374 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 6 
















Met 


Thr 


Thr 


Tyr 


Val 

5 


Trp 


Ser 


Tyr 


Leu 


1 

Ala 


Asp 


He 


Leu 


Asp 


Ala 


Val 


Gin 


Lys 








20 










£. 3 


He 


Leu 


Gly 


Gin 


Ser 


Val 


G1U 


Asn 


pne 






35 










A A 




HIS 


Gly 


He 


Ala 


His 


Cys 


val 


Gly 


vai 




50 










DD 






Lys 


Leu 


Ala 


Leu 


Glu 


Ser 


vai 


Gly 


vai 


65 










1 u 








Thr 


Va± 


Ser 


Asn 


Thr 

o c 


Ala 


Ala 


Pro 


i nr 


biy 


Ala 


Arg 


Pro 


vai 


fne 


vai 


Asp 


vai 






1UU 










J. VJ □ 


Asp 


J. lit 


ASp 


Leu 


vai 


blU 


Ala 


nla 


val 




1 1 5 










120 




vai 


Pro 


vai 


HIS 


Leu 


Tyr 


r:i \/ 
uiy 


r;l n 
uin 


t*ys 














1 *35 






blU 


Leu 


ax a 


Asp 


Arg 


Arg 


JjJLy 


L@U 


Lys 


145 


















Ala 


His 


Gly 


Ala 


Arg 


Arg 


Asp 


oiy 


Arg 










165 










Ala 


Ala 


Ala 


Phe 


Ser 


Phe 


Tyr 


Pro 


Thr 








180 










185 


Asp 


Gly 


Gly 


Ala 


Val 


Val 


Thr 


Asn 


Asp 






195 










200 




Arg 


Arg 


Leu 


Arg 


Tyr 


Tyr 


Gly 


Met 


Glu 




210 










215 






Thr 


Pro 


Gly 


His 


Asn 


Ser 


Arg 


Leu 


Asp 


225 










230 








Arg 


Arg 


Lys 


Leu 


Thr 


Arg 


Leu 


Asp 


Ala 










245 










Val 


Ala 


Gin 


Arg 


Tyr 


Val 


Asp 


Gly 


Leu 








260 










265 


Gly 


Leu 


Glu 


Leu 


Pro 


Val 


Val 


Thr 


Asp 




275 










280 




Val 


Tyr 


Val 


Val 


Arg 


His 


Pro 


Arg 


Arg 




290 










295 






Arg 


Asp 


Gly 


Tyr 


Asp 


He 


Ser 


Leu 


Asn 


305 










310 








His 


Thr 


Met 


Thr 


Gly 


Phe 


Ala 


His 


Leu 










325 










Pro 


Val- 


Thr 


Glu 


Arg 


Leu 


Ala 


Gly 


Glu 








340 










345 
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Gin 


Thr Glu Asp 


Ala 


Pro 


Leu 


330 






335 




Phe 


Leu Ala Val 


Ser 


Leu 


Val 






350 






Glu 


Leu Pro Val 


Met 


Arg 


Asn 




365 








Lys 


Pro Ala Thr 


Glu 


Pro 


Glu 



380 



lieu oiu lyr oiu 


Arg 


olu 


Arg 










Vol rne rtla ser 

• 


oiy 

•an 


Ser 


Leu 


Glu Thr Glu Tyr 


Aia 


Arg 


Tyr 


A R 
H O 








rtojj noil oxy i tlx. 


lien 
noil 


Al a 


Va 1 
v ax 


fin 








Gly Arg Asp Asp 


Glu 


Val 


Val 


75 






o u 


Val Leu Ala Tie 
v a x ucu Ai-L a x x c 


Asp 


Glu 


He 


go 




95 




Rrn Asd Glu A<?r> 


Tvr 
j. yx 

110 


Leu 


Met 


Thr Pro Ara Thr 


Lys 


Ala 


He 


125 








Val Asd Met Thr 

» a ^ no^/ i i» Alii. 


Ala 


Leu 


Ara 

«x y 


140 








Leu Val Glu Asp 


Cys 


Ala 


Gin 








160 


Leu Ala Gly Thr 


Met 


Ser 


Asp 


170 




175 




Lys Val Leu Gly 


Ala 
190 


Tyr 


Gly 


Asp Glu Thr Ala 


Arg 


Ala 


Leu 


205 








Glu Val Tyr Tyr 


Val 


Thr 


Arg 


220 








Glu Val Gin Ala 


Glu 


He 


Leu 


235 






240 


Tyr Val Ala Gly 


Arg 


Arg 


Ala 


250 




255 




Ala Asp Leu Gin 


Asp 
270 


Ser 


His 


Gly Asn Glu His 


Val 


Phe 


Tyr 


285 








Asp Glu lie lie 


Lys 


Arg 


Leu 


300 








lie Ser Tyr Pro 


Trp 


Pro 


Val 


315 






320 


Gly Val Ala Ser 


Gly 


Ser 


Leu 


330 




335 




lie Phe Ser Leu 


Pro 
350 


Met 


Tyr 
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Pro Ser Leu Pro His Asp Leu Gin Asp Arg Val lie Glu Ala Val Arg 

355 360 365 

Glu Val He Thr Gly Leu 
370 

<210> 7 
<211> 257 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 7 



Met 


rro 


Asn 


Ser 


His 




1 1 IX. 


Thr 

X I IX 


Cay Cor Thr 

OCX OCX X UX 


B en Val Ala Prn 
nop val nia ri u 


iyr 


i 








5 










1 c: 
ij 




tj±U 


Arg 




Asp 


He 


Ttir 

i yx 


His 


Asp 


Phea Tvr Hi ^ 


(2 1 i; Arrr fil v T.vq 
V3X y nx. y oxy j ^ 


Gly 








20 
















Tyr 


Arg 


Ala 


Glu Ala 


Asp 


Al A 


Leu 


Val Glu Val 
vox uiu val 


Mia m(j Xiyij nlo 


I I1X 




















Pro 


urn 


ZVl a 

ax a 


Ala 


Thr 


Leu 


T.oi i 

LlC LI 




Val Ala Pv<! 
vox r\xa oys 


f2 1 if Thr 1 w Cor 
oxy nix wiy ocx 


nx o 




50 
















fift 




Leu 


val 


CjIU 


Leu 


Ala 


Asp 


OCX. 


Phe 
tilt: 


Arrc filn Val 
rtj. y vjx u vai 


Val f^l w Val Aen 
VdX oly val riSLJ 


Leu 


c c 
DO 










/ U 






7S 




o u 


Ser 


Ala 


Til - 

Aia 


Met 


Leu 


ri J. a 


Thr 
1 1 IX 


Al a 
Ala 


A 1 .a ZX ttt Ben 
rxx d ax y rid n 


r\t>LJ riu oxy ni y 


fil 11 










85 








Qft 


7J 




Leu 


Hit* 

HIS 


bin 


Gly Asp 




Arg 


Asp 


Pho Car T.on 
flit: OCX JjcU 


nb^J niy niy t i 1C 


Asp 








100 












11ft 
X 1U 




Val 


Val 


Thr 


Cys 


Met 


Phe 


Ser 


Ser 


Thr Gly Tyr 


Leu Val Asp Glu 


Ala 






115 










120 




125 




Glu 


Leu 


Asp 


Arg 


Ala 


Val 


Ala 


Asn 


Leu Ala Gly 


His Leu Ala Pro 


Gly 




130 










135 






140 




Gly 


Thr 


Leu 


Val 


Val 


Glu 


Pro 


Trp 


Trp Phe Pro 


Glu Thr Phe Arg 


Pro 


145 










150 






155 




160 


Gly 


Trp 


Val' 


Gly Ala 


Asp 


Leu 


Val 


Thr Ser Gly 


Asp Arg Arg lie 


Ser 










165 








170 


175 




Arg 


Met 


Ser 


His 


Thr 


Val 


Pro 


Ala 


Gly Leu Pro 


Asp Arg Thr Ala 


Ser 






180 










185 


190 




Arg 


Met 


Thr 


He 


His 


Tyr 


Thr 


Val 


Gly Ser Pro 


Glu Ala Gly He 


Glu 




195 










200 




205 




His 


Phe 


Thr 


Glu 


Val 


His 


Val 


Met 


Thr Leu Phe 


Ala Arg Ala Ala 


Ty r 




210 










215 






220 




Glu 


Gin 


Ala 


Phe 


Gin 


Arg 


Ala 


Gly 


Leu Ser Cys 


Ser Tyr Val Gly 


His 


225 










230 






235 




240 


Asp 


Leu 


Phe 


Ser 


Pro 


Gly 


Leu 


Phe 


Val Gly Val 


Ala Ala Glu Pro 


Gly 








245 








250 


255 





Arg 



<210> 8 
<211> 201 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 8 

Met Arg Val Glu Glu Leu Gly He Glu Gly Val Phe Thr Phe Thr Pro 

15 10 15 

Gin Thr Phe Ala Asp Glu Arg Gly Val Phe Gly Thr Ala Tyr Gin Glu 

20 25 30 

Asp Val Phe Val Ala Ala Leu Gly Arg Pro Leu Phe Pro Val Ala Gin 

35 40 45 

Val Ser Thr Thr Arg Ser Arg Arg Gly Val Val Arg Gly Val His Phe 

50 - 55 60 

Thr Thr Met Pro Gly Ser Met Ala Lys Tyr Val Tyr Cys Ala Arg Gly 
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Glu 


Lys 


Leu 


Ala Ala 
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230 
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Val 


Leu 
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245 








250 
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Arg Leu 
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Gly 
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Trp 
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He 
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He 
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Trp 
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Trp 
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Ser 
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Arg 
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He 


Val 
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Val 
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He 


Asp 
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Trp 
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Glu 


Val 
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Val 
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Tyr 
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Ala 


Ala 


Gin 


Glu 


Ser 
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Ser 
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Gin 
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Leu 


Tyr 
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Val 


Leu 


Pro 


Ala 


Ala 


Gin 






205 








Pro 


Leu 
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Gly 


Gly 


Leu 
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Gly 


Thr 


Ala 


Val 


Lys 


Ser 


235 










240 


Ala 


Val 


Arg 


Pro 


Leu 


Val 
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Gly 


Ala 


Asp 


Pro 


Ala 


Glu 
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He 


Leu 
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Ala 


Val 
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Ala 


Leu 
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Ala 


Ala 
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Glu 


Leu 


Glu 


Ala 


He 


Phe 
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Thr 
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Ser 
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Glu 
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Arg 


Tyr 
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Arg 


Arg 
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Pro 


Glu 
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Arg 


Ser 


Trp 
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Glu 


Ala 


Thr 


Ser Arg 


Phe 


Val 


Phe 


Pro 
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105 


Val 


Asp 


Glu 


Leu 


Val 


Glu 


Phe 


Ala 


Met 
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Leu 


Trp 


Glu 


Pro 


Phe 


Thr 


Phe 


Ala 


Gly 




130 










135 






Gly 


Ala 


Ala 


His 


Ala 


Arg 


Leu 


Leu 


Trp 


145 










150 








Phe 


Arg 


Ser 


Arg 


Ser 


Gin 


Asp 


Leu 


Arg 
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Arg 


Pro 


Asp 
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Leu 
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Trp 


Leu 
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Gly 


Leu 


Asp 
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Glu 


Asp 


Leu 


Ala 
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Leu 


Pro 


Glu 


Ser 


Phe 


Arg 


Leu 


Glu 
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215 






Thr 


Arg 


Thr 
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Pro 


Tyr 


Asn 


Gly 


Ser 
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230 








Arg 


Thr 


Ser 
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Val 


Arg 


Arg 


Val 
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Thr 
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Gin 
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Ala 
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He 
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280 




Asp 


Pro 


Ala 
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Val 


Pro 


Asp 


Asn 


Val 
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Met 


Asn 
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Leu 


Leu 


Pro 


Gly 
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Ala 
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310 
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Trp Ala 


Thr 


Ala 


Leu 


His 
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Val 


Ala 
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Glu 


Trp 
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Cys 


Val 


Leu 








340 
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Ala 
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Trp 
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Thr 


Val 


Val 
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Ala 


Glu 
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Gin 


Arg 


Asp 


Gin 


Gly 


Gly 


Leu 


Arg 


Arg 


Phe 


Leu 


Leu 


Ala 


Val 


Ala 










405 










410 










415 




Gin 


Ala 


Tyr 


Thr 


Gly 


Gly 


Val 


Thr 


Val 


Asp 


Trp 


Thr 


Ala 


Ala 


Tyr 


Pro 








420 










425 










430 






Gly 


Val 


Thr 


Pro 


Gly 


His 


Leu 


Pro 


Ser 


Ala 


Val 


Ala 


Val 


Glu 


Thr 


Asp 




435 










440 










445 








Glu 


Gly 


Pro 


Ser 


Thr 


Glu 


Phe 


Asp 


Trp 


Ala 


Ala 


Pro 


Asp 


His 


Val 


Leu 




450 










455 










460 










Arg 


Ala 


Arg 


Leu 


Leu 


Glu 


He 


Val 


Gly 


Ala 


Glu 


Thr 


Ala 


Ala 


Leu 


Ala 


465 










470 










475 










480 


Gly 


Arg 


Glu 


Val 


Asp 


Ala 


Arg 


Ala 


Thr 


Phe 


Arg 


Glu 


Leu 


Gly 


Leu 


Asp 










485 










490 










4 95 




Ser 


Val 


Leu 


Ala 


Val 


Gin 


Leu 


Arg 


Thr 


Arg 


Leu 


Ala 


Thr 


Ala 


Thr 


Gly 








500 










505 










510 






Arg 


Asp 


Leu 


His 


He 


Ala 


Met 


Leu 


Tyr 


Asp 


His 


Pro 


Thr 


Pro 


His 


Ala 


515 










520 










525 








Leu 


Thr 


Glu 


Ala 


Leu 


Leu 


Arg 


Gly 


Pro 


Gin 


Glu 


Glu 


Pro 


Gly 


Arg 


Gly 




530 










535 










540 










Glu 


Glu 


Thr 


Ala 


His 


Pro 


Thr 


Glu 


Ala 


Glu 


Pro 


Asp 


Glu 


Pro 


Val 


Ala 


545 










550 










555 










560 


Val 


Val 


Ala 


Met 


Ala 


Cys 


Arg 


Leu 


Pro 


Gly 


Gly 


Val 


Thr 


Ser 


Pro 


Glu 










565 










570 










575 




Glu 


Phe 


Trp 


Glu 


Leu 


Leu 


Ala 


Glu 


Gly 


Arg 


Asp 


Ala 


Val 


Gly 


Gly 


Leu 








580 










585 










590 






Pro 


Thr 


Asp 


Arg 


Gly 


Trp 


Asp 


Leu 


Asp 


Ser 


Leu 


Phe 


His 


Pro 


Asp 


Pro 
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595 








600 










605 








Thr Arg 


Ser 


Gly 


Thr 


Ala His 


Gin 


Arg 


Ala 


Gly 


Gly 


Phe 


Leu 


Thr 


Gly 


610 








615 










620 










Ala Thr 


Ser 


Phe 


Asp 


Ala Ala 


Phe 


Phe 


Gly 


Leu 


Ser 


Pro 


Arg 


Glu 


Ala 


625 








630 








635 










640 


Leu Ala 


Val 


Glu 


Pro 
645 


Gin Gin 


Arg 


He 


Thr 
650 


Leu 


Glu 


Leu 


Ser 


Trp 
655 


Glu 


Val Leu 


Glu 


Arg 
660 


Ala 


Gly He 


Pro 


Pro 
665 


Thr 


Ser 


Leu 


Arg 


Thr 
670 


Ser 


Arg 


Thr Gly 


Val 


Phe 


Val 


Gly Leu 


He 


Pro 


Gin 


Glu 


Tyr 


Gly 


Pro 


Arg 


Leu 


675 








680 










685 








Ala Glu 


Gly 


Gly 


Glu 


Gly Val 


Glu 


Gly 


Tyr 


Leu 


Met 


Thr 


Gly 


Thr 


Thr 


690 








695 










700 










Thr Ser 


Val 


Ala 


Ser 


Gly Arg 


Val 


Ala 


Tyr 


Thr 


Leu 


Gly 


Leu 


Glu 


Gly 


705 








710 








715 










720 


Pro Ala 


He 


Ser 


Val 
725 


Asp Thr 


Ala 


Cys 


Ser 
730 


Ser 


Ser 


Leu 


Val 


Ala 
735 


Val 


His Leu 


Ala 


Cys 
740 


Gin 


Ser Leu 


Arg 


Arg 
745 


Gly 


Glu 


Ser 


Thr 


Met 
750 


Ala 


Leu 


Ala Gly 


Gly 


Val 


Thr 


Val Met 


Pro 


Thr 


Pro 


Gly 


Met 


Leu 


Val 


Asp 


Phe 


755 








760 










765 








Ser Arg 


Met 


Asn 


Ser 


Leu Ala 


Pro 


Asp 


Gly 


Arg 


Ser 


Lys 


Ala 


Phe 


Ser 


770 








775 










780 










Ala Ala 


Ala 


Asp 


Gly 


Phe Gly 


Met 


Ala 


Glu 


Gly 


Ala 


Gly 


Met 


Leu 


Leu 


785 








790 








795 










800 


Leu Glu 


Arg 


Leu 


Ser 
805 


Asp Ala 


Arg 


Arg 


His 
810 


Gly 


His 


Pro 


Val 


Leu 
815 


Ala 


Val He 


Arg 


Gly 


Thr 


Ala Val 


Asn 


Ser 


Asp 


Gly 


Ala 


Ser 


Asn 


Gly 


Leu 




820 








825 










830 






Ser Ala 


Pro 
835 


Asn 


Gly 


Arg Ala 


Gin 
840 


Val 


Arg 


Val 


He 


Arg 
845 


Gin 


Ala 


Leu 


Ala Glu 


Ser 'Gly 


Leu 


Thr Pro 


His 


Thr 


Val 


Asp 


Val 


Val 


Glu 


Thr 


His 


850 








855 










860 










Gly Thr 


Gly 


Thr 


Arg 


Leu Gly 


Asp 


Pro 


He 


Glu 


Ala 


Arg 


Ala 


Leu 


Ser 


865 








870 








875 










880 


Asp Ala 


Tyr 


Gly 


Gly 


Asp Arg 


Glu 


His 


Pro 


Leu 


Arg 


He 


Gly 


Ser 


Val 






885 








890 










895 




Lys Ser 


Asn 


He 


Gly 


His Thr 


Gin 


Ala 


Ala 


Ala 


Gly 


Val 


Ala 


Gly 


Leu 




900 








905 










910 






He Lys 


Leu 


Val 


Leu 


Ala Met 


Gin 


Ala 


Gly 


Val 


Leu 


Pro 


Arg 


Thr 


Leu 


915 








920 










925 








His Ala 


Asp 


Glu 


Pro 


Ser Pro 


Glu 


He 


Asp 


Trp 


Ser 


Ser 


Gly 


Ala 


He 


930 








935 










940 










Ser Leu 


Leu 


Gin 


Glu 


Pro Ala 


Ala 


Trp 


Pro 


Ala 


Gly 


Glu 


Arg 


Pro 


Arg 


945 








950 








955 










960 


Arg Ala 


Gly 


Val 


Ser 


Ser Phe 


Gly 


He 


Ser 


Gly 


Thr 


Asn 


Ala 


His 


Ala 






965 








970 










975 




He He 


Glu 


Glu 
980 


Ala 


Pro Pro 


Thr 


Gly 
985 


Asp 


Asp 


Thr 


Arg 


Pro 
990 


Asp 


Arg 


Met Gly 


Pro 


Val 


Val 


Pro Trp 


Val 


Leu 


Ser 


Ala 


Ser 


Thr 


Gly 


Glu 


Ala 


995 








1000 








1005 






Leu Arg 


Ala 


Arg 


Ala 


Ala Arg 


Leu 


Ala 


Gly 


His 


Leu 


Arg Glu 


His 


Pro 


1010 






1015 








1020 








Asp Gin Asp 


Leu 


Asp 


Asp Val Ala Tyr 


Ser 


Leu 


Ala Thr Gly 


Arg 


Ala 


1025 








1030 








1035 








1040 


Ala Leu 


Ala 


Tyr 


Arg 


Ser Gly 


Phe 


Val 


Pro 


Ala Asp Ala Ser 


Thr 


Ala 








1045 






1050 








1055 


Leu Arg 


He 


Leu 


Asp 


Glu Leu 


Ala 


Ala 


Gly Gly Ser Gly Asp 


Ala 


Val 



1060 1065 1070 



Thr Gly Thr Ala Arg Ala Pro Gin Arg Val Val Phe Val Phe Pro Gly 
1075 1080 1085 
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Gin Gly Trp Gin Trp Ala Gly Met Ala Val Asp Leu Leu Asp Gly Asp 

1090 1095 1100 

Pro Val Phe Ala Ser Val Leu Arg Glu Cys Ala Asp Ala Leu Glu Pro 
1105 1110 1115 1120 

Tyr Leu Asp Phe Glu lie Val Pro Phe Leu Arg Ala Glu Ala Gin Arg 

1125 1130 1135 

Arg Thr Pro Asp His Thr Leu Ser Thr Asp Arg Val Asp Val Val Gin 

1140 1145 1150 

Pro Val Leu Phe Ala Val Met Val Ser Leu Ala Ala Arg Trp Arg- Ala 

1155 1160 1165 ■ 

Tyr Gly Val Glu Pro Ala Ala Val lie Gly His Ser Gin Gly Glu lie 

1170 1175 1180 

Ala Ala Ala Cys Val Ala Gly Ala Leu Ser Leu Asp Asp Ala Ala Arg 
1185 1190 1195 1200 

Ala Val Ala Leu Arg Ser Arg Val lie Ala Thr Met Pro Gly Asn Gly 

1205 1210 1215 

Ala Met Ala Ser lie Ala Ala Ser Val Asp Glu Val Ala Ala Arg lie 

1220 1225 1230 

Asp Gly Arg Val Glu lie Ala Ala Val Asn Gly Pro Arg Ala Val Val 

1235 1240 1245 

Val Ser Gly Asp Arg Asp Asp Leu Asp Arg Leu Val Ala Ser Cys Thr 

1250 * 1255 1260 

Val Glu Gly Val Arg Ala Lys Arg Leu Pro Val Asp Tyr Ala Ser His 
1265 1270 " 1275 1280 

Ser Ser His Val Glu Ala Val Arg Asp Ala Leu His Ala Glu Leu Gly 

1285 1290 1295 

Glu Phe Arg Pro Leu Pro Gly Phe Val Pro Phe Tyr Ser Thr Val Thr 

1300 1305 1310 

Gly Arg Trp Val Glu Pro Ala Glu Leu Asp Ala Gly Tyr Trp Phe Arg 

1315 1320 1325 

Asn Leu Arg His Arg Val Arg Phe Ala Asp Ala Val Arg Ser Leu Ala 

1330 1335 1340 

Asp Gin Gly Tyr Thr Thr Phe Leu Glu Val Ser Ala His Pro Val Leu 
1345 1350 1355 1360 

Thr Thr Ala lie Glu Glu lie Gly Glu Asp Arg Gly Gly Asp Leu Val 

1365 1370 1375 

Ala Val His Ser Leu Arg Arg. Gly Ala Gly Gly Pro Val Asp Phe Gly 

1380 1385 1390 

Ser Ala Leu Ala Arg Ala Phe Val Ala Gly Val Ala Val Asp Trp Glu 

1395 1400 1405 

Ser Ala Tyr Gin Gly Ala Gly Ala Arg Arg Val Pro Leu Pro Thr Tyr 

1410 1415 1420 

Pro Phe Gin Arg Glu Arg Phe Trp Leu Glu Pro Asn Pro Ala Arg Arg 
1425 1430 1435 1440 

Val Ala Asp Ser Asp Asp Val Ser Ser Leu Arg Tyr Arg lie Glu Trp 

1445 1450 1455 

His Pro Thr Asp Pro Gly Glu Pro Gly Arg Leu Asp Gly Thr Trp Leu 

1460 1465 1470 

Leu Ala Thr Tyr Pro Gly Arg Ala Asp Asp Arg Val Glu Ala Ala Arg 

1475 1480 1485 

Gin Ala Leu Glu Ser Ala Gly Ala Arg Val Glu Asp Leu Val Val Glu 

1490 1495 1500 

Pro Arg Thr Gly Arg Val Asp Leu Val Arg Arg Leu Asp Ala Val Gly 
1505 1510 1515 1520 

Pro Val Ala Gly Val Leu Cys Leu Phe Ala Val Ala Glu Pro Ala Ala 

1525 1530 1535 

Glu His Ser Pro Leu Ala Val Thr Ser Leu Ser Asp Thr Leu Asp Leu 

1540 1545 1550 

Thr Gin Ala Val Ala Gly Ser Gly Arg Glu Cys Pro lie Trp Val Val 

. 1555 1560 1565 

Thr Glu Asn Ala Val Ala Val Gly Pro Phe Glu Arg Leu Arg Asp Pro 
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1570 



1575 



1580 



/Via His Gly Ala Leu Trp Ala Leu Gly Arg Val Val Ala Leu Glu Asn 
1585 1590 1595 160( 

Pro Ala Val Trp Gly Gly Leu Val Asp Val Pro Ser Gly Ser Val Ala 

1605 * 1610 1615 

Glu Leu Ser Arg His Leu Gly Thr Thr Leu Ser Gly Ala Gly Glu Asp 

1620 1625 1630 

Gin Val Ala Leu Arg Pro Asp Gly Thr Tyr Ala Arg Arg Trp Cys Arg 

1635 1640 1645 

Ala Gly Ala Gly Gly Thr Gly Arg Trp Gin Pro Arg Gly Thr Val Leu 

1650 1655 1660 

Val Thr Gly Gly Thr Gly Gly Val Gly Arg His Val Ala Arg Trp Leu 
1665 1670 1675 168( 

Ala Arg Gin Gly Thr Pro Cys Leu Val Leu Ala Ser Arg Arg Gly Pro 

1685 1690 1695 

Asp Ala Asp Gly Val Glu Glu Leu Leu Thr Glu Leu Ala Asp Leu Gly 

1700 1705 1710 

Thr Arg Ala Thr Val Thr Ala Cys Asp Val Thr Asp Arg Glu Gin Leu 

1715 1720 1725 

Arg Ala Leu Leu Ala Thr Val Asp Asp Glu His Pro Leu Ser Ala Val 

1730 1735 1740 

Phe His Val Ala Ala Thr Leu Asp Asp Gly Thr Val Glu Thr Leu Thr 
1745 1750 1755 176( 

Gly Asp Arg He Glu Arg Ala Asn Arg Ala Lys Val Leu Gly Ala Arg 

1765 1770 1775 

Asn Leu His Glu Leu Thr Arg Asp Ala Asp Leu Asp Ala Phe Val Leu 

1780 1785 1790 

Phe Ser Ser Ser Thr Ala Ala Phe Gly Ala Pro Gly Leu Gly Gly Tyr 

1795 1800 1805 

Val Pro Gly Asn Ala Tyr Leu Asp Gly Leu Ala Gin Gin Arg Arg Ser 

1810 1815 1820 

Glu Gly Leu' Pro Ala Thr Ser Val Ala Trp Gly Thr Trp Ala Gly Ser 
1825 1830 1835 184( 

Gly Met Ala Glu Gly Pro Val Ala Asp Arg Phe Arg Arg His Gly Val 

1845 1850 1855 

Met Glu Met His Pro Asp Gin Ala Val Glu Gly Leu Arg Val Ala Leu 

I860 1865 1870 

Val Gin Gly Glu Val Ala Pro He Val Val Asp He Arg Trp Asp Arg 

1875 1880 1885 

Phe Leu Leu Ala Tyr Thr Ala Gin Arg Pro Thr Arg Leu Phe Asp Thr 

1890 1895 1900 

Leu Asp Glu Ala Arg Arg Ala Ala Pro Gly Pro Asp Ala Gly Pro Gly 
1905 1910 1915 192( 

Val Ala Ala Leu Ala Gly Leu Pro Val Gly Glu Arg Glu Lys Ala Val 

1925 1930 1935 

Leu Asp Leu Val Arg Thr His Ala Ala Ala Val Leu Gly His Ala Ser 

1940 1945 1950 

Ala Glu Gin Val Pro Val Asp Arg Ala Phe Ala Glu Leu Gly Val Asp. 

1955 I960 1965 

Ser Leu Ser Ala Leu Glu Leu Arg Asn Arg Leu Thr Thr Ala Thr Gly 

1970 1975 1980. 

Val Arg Leu Ala Thr Thr Thr Val Phe Asp His Pro Asp Val Arg Thr 
1985 1990 1995 200( 

Leu Ala Gly His Leu Ala Ala Glu Leu Gly Gly Gly Ser Gly Arg Glu 

2005 2010 2015 

Arg Pro Gly Gly Glu Ala Pro Thr Val Ala Pro Thr Asp Glu Pro He 

2020 2025 2030 

Ala He Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val Asp Ser Pro 

2035 2040 2045 

Glu Giri-Leu Trp Glu Leu lie Val Ser Gly Arg Asp Thr Ala Ser Ala 
2050 2055 2060 
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Ala Pro Gly Asp Arg Ser Trp Asp Pro Ala Glu Leu Met Val Ser Asp 
2065 2070 2075 2080 

Thr Thr Gly Thr Arg Thr Ala Phe Gly Asn Phe Met Pro Gly Ala Gly 

2085 2090 2095 

Glu Phe Asp Ala Ala Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Ala 

2100 2105 . 2110 

Met Asp Pro Gin Gin Arg His Ala Leu Glu Thr Thr Trp Glu Ala Leu 

2115 2120 2125 

Glu Asn Ala Gly lie Arg Pro Glu Ser Leu Arg Gly Thr Asp Thr Gly 

2130 2135 2140 

Val Phe Val Gly Met Ser His Gin Gly Tyr Ala Thr Gly Arg Pro Lys 
2145 2150 2155 2160 

Pro Glu Asp Glu Val Asp Gly Tyr Leu Leu Thr Gly Asn Thr Ala Ser 

2165 2170 2175 

Val Ala Ser Gly Arg lie Ala Tyr Val Leu Gly Leu Glu Gly Pro Ala 

2180 2185 2190 

He Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu His Val 

2195 2200 2205 

Ala Ala Gly Ser Leu Arg Ser Gly Asp Cys Gly Leu Ala Val Ala Gly 

2210 2215 2220 

Gly Val Ser Val Met Ala Gly Pro Glu Val Phe Arg Glu Phe Ser Arg 
2225 2230 2235 2240 

Gin Gly Ala Leu Ala Pro Asp Gly Arg Cys Lys Pro Phe Ser Asp Glu 

2245 2250 , 2255 

Ala Asp Gly Phe Gly Leu Gly Glu Gly Ser Ala Phe Val Val Leu Gin 

2260 2265 2270 

Arg Leu Ser Val Ala Val Arg Glu Gly Arg Arg Val Leu Gly Val Val 

2275 2280 2285 

Val Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 

2290 2295 2300 

Pro Ser Gly Val Ala Gin Gin Arg Val He Arg Arg Ala Trp Gly Arg 
2305 2310 2315 2320 

Ala Gly Val Ser Gly Gly Asp Val Gly Val Val Glu Ala His Gly Thr 

2325 2330 2335 

Gly Thr Arg Leu Gly Asp Pro Val Glu Leu Gly Ala Leu Leu Gly Thr 

2340 2345 2350 

Tyr Gly Val Gly Arg Gly Gly Val Gly Pro Val Val Val Gly Ser Val 

2355 2360 2365 

Lys Ala Asn Val Gly His Val Gin Ala Ala Ala Gly Val Val Gly Val 

2370 2375 „ 2380 

He Lys Val Val Leu Gly Leu Gly Arg Gly Leu Val Gly Pro Met Val 
2385 2390 2395 2400 

Cys Arg Gly Gly Leu Ser Gly Leu Val Asp Trp Ser Ser Gly Gly Leu 

2405 2410 2415 

Val Val Ala Asp Gly Val Arg Gly Trp Pro Val Gly Val Asp Gly Val 

2420 " 2425 2430 

Arg Arg Gly Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 

2435 2440 2445 

Val Val Val Ala Glu Ala Pro Gly Ser Val Val Gly Ala Glu Arg Pro 

2450 2455 2460 

Val Glu Gly Ser Ser Arg Gly Leu Val Gly Val Val Gly Gly Val Val 
2465 2470 2475 2480 

Pro Val Val Leu Ser Ala Lys Thr Glu Thr Ala Leu His Ala Gin Ala 

2485 2490 2495 

Arg Arg Leu Ala Asp His Leu Glu Thr His Pro Asp Val Pro Met Thr 

2500 2505 2510 

Asp Val Val Trp Thr Leu Thr Gin Ala Arg Gin Arg Phe Asp Arg Arg 

2515 2520 2525 

Ala Val Leu Leu Ala Ala Asp Arg Thr Gin Ala Val Glu Arg Leu Arg 

2530 2535 2540 

Gly Leu Ala Gly Gly Glu Pro Gly Thr Gly Val Val Ser Gly Val Ala 
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2545 



2550 



2555 



2560 



Ser Gly Gly Gly Val Val Phe Val Phe Pro Gly Gin Gly Gly Gin Trp 

2565 2570 2575 

Val Gly Met Ala Arg Gly Leu Leu Ser Val Pro Val Phe Val Glu Ser 

2580 2585 2590 

Val Val Glu Cys Asp Ala Val Val Ser Ser Val Val Gly Phe Ser Val 

2595 2600 2605 

Leu Gly Val Leu Glu Gly Arg Ser Gly Ala Pro Ser Leu Asp Arg Val 

2610 2615 2620 

Asp Val Val Gin Pro Val Leu Phe Val Val Met Val Ser Leu Ala Arg 
2625 2630 2635 2640 

Leu Trp Arg Trp Cys Gly Val Val Pro Ala Ala Val Val Gly His Ser 

2645 2650 2655 

Gin Gly Glu lie Ala Ala Ala Val Val Ala Gly Val Leu Ser Val Gly 

2660 . 2665 2670 

Asp Gly Ala Arg Val Val Ala Leu Arg Ala Arg Ala Leu Arg Ala Leu 

2675 2680 2685 

Ala Gly His Gly Gly Met Ala Ser Val Arg Arg Gly Arg Asp Asp Val 

2690 2695 2700 

Gin Lys Leu Leu Asp Ser Gly Pro Trp Thr Gly Lys Leu Glu lie Ala 
2705 2710 2715 2720 

Ala Val Asn Gly Pro Asp Ala Val Val Val Ser Gly Asp Pro Arg Ala 

2725 2730 2735 

Val Thr Glu Leu Val Glu His Cys Asp Gly lie Gly Val Arg Ala Arg 

2740 2745 2750 

Thr lie Pro Val Asp Tyr Ala Ser His Ser Ala Gin Val Glu Ser Leu 

2755 2760 2765 

Arg Glu Glu Leu Leu Ser Val Leu Ala Gly lie Glu Gly Arg Pro Ala 

2770 2775 2780 

Thr Val Pro Phe Tyr Ser Thr Leu Thr Gly Gly Phe Val Asp Gly Thr 
2785 2790 2795 2800 

Glu Leu Asp* Ala Asp Tyr Trp Tyr Arg Asn Leu Arg His Pro Val Arg 

2805 2810 2815 

Phe His Ala Ala Val Glu Ala Leu Ala Ala Arg Asp Leu Thr Thr Phe 

2820 2825 2830 

Val Glu Val Ser Pro His Pro Val Leu Ser Met Ala Val Gly Glu Thr 

2835 2840 2845 

Leu Ala Asp Val Glu Ser Ala Val Thr Val Gly Thr Leu Glu Arg Asp 

2850 2855 2860 

Thr Asp Asp Val Glu Arg Phe Leu Thr Ser Leu Ala Glu Ala His Val 
2865 2870 2875 2880 

His Gly Val Pro Val Asp Trp Ala Ala Val Leu Gly Ser Gly Thr Leu 

2885 2890 2895 

Val Asp Leu Pro Thr Tyr Pro Phe Gin Gly Arg Arg Phe Trp Leu His 

2900 2905 2910 

Pro Asp Arg Gly Pro Arg Asp Asp Val Ala Asp Trp Phe His Arg Val 

2915 2920 2925 

Asp Trp Thr Ala Thr Ala Thr Asp Gly Ser Ala Arg Leu Asp Gly Arg 

2930 2935 2940 

Trp Leu Val Val Val Pro Glu Gly Tyr Thr Asp Asp Gly Trp Val Val 
2945 2950 2955 2960 

Glu Val Arg Ala Ala Leu Ala Ala Gly Gly Ala Glu Pro Val Val Thr 

2965 2970 2975 

Thr Val Glu Glu Val Thr Asp Arg Val Gly Asp Ser Asp Ala Val Val 

2980 2985 2990 

Ser Met Leu Gly Leu Ala Asp Asp Gly Ala Ala Glu Thr Leu Ala Leu 

2995 3000 3005 

Leu Arg Arg Leu Asp Ala Gin Ala Ser Thr Thr Pro Leu Trp Val Val 

3010 3015 3020 

Thr Val Gly Ala Val Ala Pro Ala Gly Pro Val Gin Arg Pro Glu Gin 
3025 3030 3035 3040 
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Ala Thr Val Trp Gly Leu Ala Leu Val Ala Ser Leu Glu Arg Gly His 

3045 3050 3055 

Arg Trp Thr Gly Leu Leu Asp Leu Pro Gin Thr Pro Asp Pro Gin Leu 

3060 3065 3070 

Arg Pro Arg Leu Val Glu Ala Leu Ala Gly Ala Glu Asp Gin Val Ala 

3075 3080 3085 

Val Arg Ala Asp Ala Val His Ala Arg Arg lie Val Pro Thr Pro Val 

3090 3095 3100 

Thr Gly Ala Gly Pro Tyr Thr Ala Pro Gly Gly Thr lie Leu Val Thr 
3105 3110 3115 3120 

Gly Gly Thr Ala Gly Leu Gly Ala Val Thr Ala Arg Trp Leu Ala Glu 

3125 3130 3135 

Arg Gly Ala Glu His Leu Ala Leu Val Ser Arg Arg Gly Pro Gly Thr 

3140 3145 3150 

Ala Gly Val Asp Glu Val Val Arg Asp Leu Thr Gly Leu Gly Val Arg 

3155 3160 3165 

Val Ser Val His Ser Cys Asp Val Gly Asp Arg Glu Ser Val Gly Ala 

3170 3175 3180 

Leu Val Gin Glu Leu Thr Ala Ala Gly Asp Val Val Arg Gly Val Val 
3185 3190 3195 3200 

His Ala Ala Gly Leu Pro Gin Gin Val Pro Leu Thr Asp Met Asp Pro 

3205 3210 3215 

Ala Asp Leu Ala Asp Val Val Ala Val Lys Val Asp Gly Ala Val His 

3220 3225 3230 1 

Leu Ala Asp Leu Cys Pro Glu Ala Glu Leu Phe Leu Leu Phe Ser Ser 

3235 3240 3245 

Gly Ala Gly Val Trp Gly Ser Ala Arg Gin Gly Ala Tyr Ala Ala Gly 

3250 3255 3260 

Asn Ala Phe Leu Asp Ala Phe Ala Arg His Arg Arg Asp Arg Gly Leu 
3265 3270 3275 3280 

Pro Ala Thr Ser Val Ala Trp Gly Leu Trp Ala Ala Gly Gly Met Thr 

3285 3290 3295 

Gly Asp Gin Glu Ala Val Ser Phe Leu Arg Glu Arg Gly Val Arg Pro 

3300 3305 3310 

Met Ser Val Pro Arg Ala Leu Glu Ala Leu Glu Arg Val Leu Thr Ala 

3315 3320 3325 

Gly Glu Thr Ala Val Val Val Ala Asp Val Asp Trp Ala Ala Phe Ala 

3330 3335 3340 

Glu Ser Tyr Thr Ser Ala Arg Pro Arg Pro Leu Leu His Arg Leu Val 
3345 3350 3355 3360 

Thr Pro Ala Ala Ala Val Gly Glu Arg Asp Glu Pro Arg Glu Gin Thr 

3365 3370 3375 

Leu Arg Asp Arg Leu Ala Ala Leu Pro Arg Ala Glu Arg Ser Ala Glu 

3380 3385 3390 

Leu Val Arg Leu Val Arg Arg Asp Ala Ala Ala Val Leu Gly Ser Asp 

3395 3400 3405 

Ala Lys Ala Val Pro Ala Thr Thr Pro Phe Lys Asp Leu Gly Phe Asp 

3410 3415 3420 

Ser Leu Ala Ala Val Arg Phe Arg Asn Arg Leu Ala Ala His Thr Gly 
3425 3430 3435 3440 

Leu Arg Leu Pro Ala Thr Leu Val Phe Glu His Pro Asn Ala Ala Ala 

3445 3450 3455 

Val Ala Asp Leu Leu His Asp Arg Leu Gly Glu Ala Gly Glu Pro Thr 

3460 3465 3470 

Pro Val Arg Ser Val Gly Ala Gly Leu Ala Ala Leu Glu Gin Ala Leu 

3475 3480 3485 

Pro Asp Ala Ser Asp Thr Glu Arg Val Glu Leu Val Glu Arg Leu Glu 

3490 3495 3500 

Arg Met Leu Ala Gly Leu Arg Pro Glu Ala Gly Ala Gly Ala Asp Ala 
3505 3510 3515 3520 

Pro Thr Ala Gly Asp Asp Leu Gly Glu Ala Gly Val Asp Glu Leu Leu 
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3525 3530 
Asp Ala Leu Glu Arg Glu Leu Asp Ala Arg 
3540 3545 



3535 



<210> 14 
<211> 3562 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 14 

Met Thr Asp Asn Asp Lys Val Ala Glu Tyr Leu Arg Arg Ala Thr Leu 

15 10 15 

Asp Leu Arg Ala Ala Arg Lys Arg Leu Arg Glu Leu Gin Ser Asp Pro 

20 25 30 

He Ala Val Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val His Leu 

35 40 45 

Pro Gin His Leu Trp Asp Leu Leu Arg Gin Gly His Glu Thr Val Ser 

50 55 60 

Thr Phe Pro Thr Gly Arg Gly Trp Asp Leu Ala Gly Leu Phe His Pro 
65 70 75 80 

Asp Pro Asp His Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu 

85 90 95 

Asp Asp Val Ala Gly Phe Asp Ala Glu Phe Phe Gly He Ser Pro Arg 

100 105 110 

Glu Ala Thr Ala Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser 

115 120 125 

Trp Glu Leu Val Glu Ser Ala Gly He Asp Pro His Ser Leu Arg Gly 

130 135 140 

Thr Pro Thr Gly Val Phe Leu Gly Val Ala Arg Leu Gly Tyr Gly Glu 
145 150 155 160 

Asn Gly Thr Glu Ala Gly Asp Ala Glu Gly Tyr Ser Val Thr Gly Val 

165 170 175 

Ala Pro Ala Val Ala Ser Gly Arg He Ser Tyr Ala Leu Gly Leu Glu 

180 185 190 

Gly Pro Ser He Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala 

195 200 205 

Leu His Leu Ala Val Glu Ser Leu Arg Leu Gly Glu Ser Ser Leu Ala 

210 215 220 

Val Val Gly Gly Ala Ala Val Met Ala Thr Pro Gly Val Phe Val Asp 
225 ' 230 235 240 

Phe Ser Arg Gin Arg Ala Leu Ala Ala Asp Gly Arg Ser Lys Ala Phe 

245 250 255 

Gly Ala Ala Ala Asp Gly Phe Gly Phe Ser Glu Gly Val Ser Leu Val 

260 265 270 

Leu Leu Glu Arg Leu Ser Glu Ala Glu Ser Asn Gly His Glu Val Leu 

275 ~ 280 285 

Ala Val He Arg Gly Ser Ala Leu Asn Gin Asp Gly Ala Ser Asn Gly 

290 295 300 

Leu Ala Ala Pro Asn Gly Thr Ala Gin Arg Lys Val He Arg Gin Ala 
305 310 315 320 

Leu Arg Asn Cys Gly Leu Thr Pro Ala Asp Val Asp Ala Val Glu Ala 

325 330 335 

His Gly Thr Gly Thr Thr Leu Gly Asp Pro He Glu Ala Asn Ala Leu 

340 345 350 

Leu Asp Thr Tyr Gly Arg Asp Arg Asp Pro Asp His Pro Leu Trp Leu 

355 ~ 360 365 

Gly Ser Val Lys Ser Asn He Gly His Thr Gin Ala Ala Ala Gly Val 

370 ' 375 380 

Thr Gly Leu Leu Lys Met Val Leu Ala Leu Arg His Glu Glu Leu Pro 
385 390 395 400 

Ala Thr Leu His Val Asp Glu Pro Thr Pro His Val Asp Trp Ser Ser 
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Tyr Pro Phe Gin Arg Lys Pro Tyr Trp Leu Arg Ser Ser Ala Pro Ala 

900 905 910 

Pro Ala Ser His Asp Leu Ala Tyr Arg Val Ser Trp Thr Pro lie Thr 

915 920 925 

Pro Pro Gly Asp Gly Val Leu Asp Gly Asp Trp Leu Val Val His Pro 

930 935 940 

Gly Gly Ser Thr Gly Trp Val Asp Gly Leu Ala Ala Ala lie Thr Ala 
945 950 955 960 

Gly Gly Gly Arg Val Val Ala His Pro Val Asp Ser Val Thr Ser Arg 

965 970 ■ 975 

Thr Gly Leu Ala Glu Ala Leu Ala Arg Arg Asp Gly Thr Phe Arg Gly 

980 985 990 

Val Leu Ser Trp Val Ala Thr Asp Glu Arg His Val Glu Ala Gly Ala 

995 1000 1005 

Val Ala Leu Leu Thr Leu Ala Gin Ala Leu Gly Asp Ala Gly lie Asp 

1010 1015 1020 

Ala Pro Leu Trp Cys Leu Thr Gin Glu Ala Val Arg Thr Pro Val Asp 
1025 1030 1035 1040 

Gly Asp Leu Ala Arg Pro Ala Gin Ala Ala Leu His Gly Phe Ala Gin 

1045 1050 1055 

Val Ala Arg Leu Glu Leu Ala Arg Arg Phe Gly Gly Val Leu Asp Leu 

1060 1065 1070 

Pro Ala Thr Val Asp Ala Ala Gly Thr Arg Leu Val Ala Ala Val Leu 

1075 1080 1085 

Ala Gly Gly Gly Glu Asp Val Val Ala Val Arg Gly Asp Arg Leu Tyr 

1090 1095 1100 

Gly Arg Arg Leu Val Arg Ala Thr Leu Pro Pro Pro Gly Gly Gly Phe 
1105 1110 1115 1120 

Thr Pro His Gly Thr Val Leu Val Thr Gly Ala Ala Gly Pro Val Gly 

1125 1130 1135 

Gly Arg Leu Ala Arg Trp Leu Ala Glu Arg Gly Ala Thr Arg Leu Val 

-1140 1145 1150 

Leu Pro Gly Ala His Pro Gly Glu Glu Leu Leu Thr Ala lie Arg Ala 

1155 1160 1165 

Ala Gly Ala Thr Ala Val Val Cys Glu Pro Glu Ala Glu Ala Leu Arg 

1170 1175 1180 

Thr Ala He Gly Gly Glu Leu Pro Thr Ala Leu Val His Ala Glu Thr 
1185 1190 1195 1200 

Leu Thr Asn Phe Ala Gly Val Ala Asp Ala Asp Pro Glu Asp Phe Ala 

1205 & 1210 1215 

Ala Thr Val Ala Ala Lys Thr Ala Leu Pro Thr Val Leu Ala Glu Val 

1220 1225 1230 

Leu Gly Asp His Arg Leu Glu Arg Glu Val Tyr Cys Ser Ser Val Ala 

1235 1240 1245 

Gly Val Trp Gly Gly Val Gly Met Ala Ala Tyr Ala Ala Gly Ser Ala 

1250 ' 1255 1260 

Tyr Leu Asp Ala Leu Val Glu His Arg Arg Ala Arg Gly His Ala Ser 
1265 1270 1275 1280 

Ala Ser Val Ala Trp Thr Pro Trp Ala Leu Pro Gly Ala Val Asp Asp 

1285 1290 1295 

Gly Arg Leu Arg Glu Arg Gly Leu Arg Ser Leu Asp Val Ala Asp Ala 

1300 1305 * 1310 

Leu Gly Thr Trp Glu Arg Leu Leu Arg Ala Gly Ala Val Ser Val Ala 

1315 " 1320 1325 

Val Ala Asp Val Asp Trp Ser Val Phe Thr Glu Gly Phe Ala Ala He 

1330 1335 1340 

Arg Pro Thr Pro Leu Phe Asp Glu Leu Leu Asp Arg Arg Gly Asp Pro 
1345 1350 1355 1360 

Asp Gly Ala Pro Val Asp Arg Pro Gly Glu Pro Ala Gly Glu Trp Gly 

1365 1370 1375 

Arg Arg He Ala Ala Leu Ser Pro Gin Glu Gin Arg Glu Thr Leu Leu 
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1380 1385 1390 

Thr Leu Val Gly Glu Thr Val Ala Glu Val Leu Gly His Glu Thr Gly 

1395 1400 1405 

Thr Glu lie Asn Thr Arg Arg Ala Phe Ser Glu Leu Gly Leu Asp Ser 

1410 1415 1420 

Leu Gly Ser Met Ala Leu Arg Gin Arg Leu Ala Ala Arg Thr Gly Leu 
1425 1430 1435 1440 

Arg Met Pro Ala Ser Leu Val Phe Asp His Pro Thr Val Thr Ala Leu 

1445 1450 1455 

Ala Arg Tyr Leu Arg Arg Leu Val Val Gly Asp Ser Asp Pro Thr Pro 

1460 1465 1470 

Val Arg Val Phe Gly Pro Thr Asp Glu Ala Glu Pro Val Ala Val Val 

1475 1480 1485 

Gly He Gly Cys Arg Phe Pro Gly Gly He Ala Thr Pro Glu Asp Leu 

1490 1495 1500 

Trp Arg Val Val Ser Glu Gly Thr Ser He Thr Thr Gly Phe Pro Thr 
1505 1510 1515 1520 

Asp Arg Gly Trp Asp Leu Arg Arg Leu Tyr His Pro Asp Pro Asp His 

1525 1530 1535 

Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu Asp Gly Ala Pro 

1540 1545 1550 

Asp Phe Asp Pro Gly Phe Phe Gly He Thr Pro Arg Glu Ala Leu Ala 

1555 1560 1565 

Met Asp Pro Gin. Gin Arg Leu Thr Leu Glu He Ala Trp Glu Ala Val 

1570 1575 1580 

Glu Arg Ala Gly He Asp Pro Glu Thr Leu Leu Gly Ser Asp Thr Gly 
1585 1590 1595 1600 

Val Phe Val Gly Met Asn Gly Gin Ser Tyr Leu Gin Leu Leu Thr Gly 

1605 1610 1615 

Glu Gly Asp Arg Leu Asn Gly Tyr Gin Gly Leu Gly Asn Ser Ala Ser 

1620 1625 1630 

Val Leu Ser' Gly Arg Val Ala Tyr Thr Phe Gly Trp Glu Gly Pro Ala 

1635 1640 1645 

Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala He His Leu 

1650 1655 1660 

Ala Met Gin Ser Leu Arg Arg Gly Glu Cys Ser Leu Ala Leu Ala Gly 
1665 1670 1675 1680 

Gly Val Thr Val Met Ala Asp Pro Tyr Thr Phe Val Asp Phe Ser Ala 

1685 1690 1695 

Gin Arg Gly Leu Ala Ala Asp Gly Arg Cys Lys Ala Phe Ser Ala Gin 

1700 1705 1710 

Ala Asp Gly Phe Ala Leu Ala Glu Gly Val Ala Ala Leu Val Leu Glu 

1715 1720 1725 

Pro Leu Ser Lys Ala Arg Arg Asn Gly His Gin Val Leu Ala Val Leu 

1730 - 1735 1740 

Arg Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 
1745 * 1750 1755 1760 

Pro Asn Gly Pro Ser Gin Glu Arg Val He Arg Gin Ala Leu Thr Ala 

1765 1770 1775 

Ser Gly Leu Arg Pro Ala Asp Val Asp Met Val Glu Ala His Gly Thr 

1780 1785 1790 

Gly Thr Glu Leu Gly Asp Pro lie Glu Ala Gly Ala Leu He Ala Ala 

1795 1800 1805 

Tyr Gly Arg Asp Arg Asp Arg Pro Leu Trp Leu Gly Ser Val Lys Thr 

1810 * 1815 1820 

Asn He Gly His Thr Gin Ala Ala Ala Gly Ala Ala Gly Val He Lys 
1825 1830 1835 1840 

Ala Val Leu Ala Met Arg His Gly Val Leu Pro Arg Ser Leu His Ala 

1845 1850 1855 

Asp Glu Leu Ser Pro His He Asp Trp Ala Asp Gly Lys Val Glu Val 
1860 1865 1870 
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Leu Arg Glu Ala Arg Gin Trp Pro Pro Gly Glu Arg Pro Arg Arg Ala 

1875 1880 1885 

Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala His Val He Val 

1890 1895 1900 

Glu Glu Ala Pro Ala Glu Pro Asp Pro Glu Pro Val Pro Ala Ala Pro 
1905 1910 1915 1920 

Gly Gly Pro Leu Pro Phe Val Leu His Gly Arg Ser Val Gin Thr Val 

1925 1930 1935 

Arg Ser Gin Ala Arg Thr Leu Ala Glu His Leu Arg Thr Thr Gly His 

1940 1945. 1950 

Arg Asp Leu Ala Asp Thr Ala Arg Thr Leu Ala Thr Gly Arg Ala Arg 

1955 I960 1965 

Phe Asp Val Arg Ala Ala Val Leu Gly Thr Asp Arg Glu Gly Val Cys 

1970 1975 1980 

Ala Ala Leu Asp Ala Leu Ala Gin Asp Arg Pro Ser Pro Asp Val Val 
1985 1990 1995 2000 

Ala Pro Ala Val Phe Ala Ala Arg Thr Pro Val Leu Val Phe Pro Gly 

2005 2010 2015 

Gin Gly Ser Gin Trp Val Gly Met Ala Arg Asp Leu Leu Asp Ser Ser 

2020 2025 2030 

Glu Val Phe Ala Glu Ser Met Gly Arg Cys Ala Glu Ala Leu Ser Pro 

2035 2040 2045 

Tyr Thr Asp Trp Asp Leu Leu Asp Val Val Arg Gly Val Gly Asp Pro 

2050 2055 2060 

Asp Pro Tyr Asp Arg Val Asp Val Leu Gin Pro Val Leu Phe Ala Val 
2065 2070 2075 2080 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

2085 2090 2095 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala 

2100 2105 2110 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 

2115 2120 2125 

Arg Val Leu Arg Glu Leu Asp Asp Gin Gly Gly Met Val Ser Val Gly 

2130 2135 2140 

Thr Ser Arg Ala Glu Leu Asp Ser Val Leu Arg Arg Trp Asp Gly Arg 
2145 2150 2155 2160 

Val Ala Val Ala Ala Val Asn Gly Pro Gly Thr Leu Val Val Ala Gly 

2165 2170 2175 

Pro Thr Ala Glu Leu Asp Glu Phe Leu Ala Val Ala Glu Ala Arg Glu 

2180 2185 2190 

Met Arg Pro Arg Arg He Ala Val Arg Tyr Ala Ser His Ser Pro Glu 

2195 2200 2205 

Val Ala Arg Val Glu Gin Arg Leu Ala Ala Glu Leu Gly Thr Val Thr 

2210 2215 2220 

Ala Val Gly Gly Thr Val Pro Leu Tyr Ser Thr Ala Thr Gly Asp Leu 
2225 2230 2235 2240 

Leu Asp Thr Thr Ala Met Asp Ala Gly Tyr Trp Tyr Arg Asn Leu Arg 

2245 2250 2255 

Gin Pro Val Leu Phe Glu His Ala Val Arg Ser Leu Leu Glu Arg Gly 

2260 2265 2270 

Phe Glu Thr Phe He Glu Val Ser Pro His Pro Val Leu Leu Met Ala 

2275 2280 2285 

Val Glu Glu Thr Ala Glu Asp Ala Glu Arg Pro Val Thr Gly Val Pro 

2290 2295 2300 

Thr Leu Arg Arg Asp His Asp Gly Pro Ser Glu Phe Leu Arg Asn Leu 
2305 2310 2315 2320 

Leu Gly Ala His Val His Gly Val Asp Val Asp Leu Arg Pro Ala Val 

2325 2330 2335 

Ala His Gly Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Asp Arg Gin 

2340 2345 2350 

Arg Leu Trp Pro Lys Pro His Arg Arg Ala Asp Thr Ser Ser Leu Gly 
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2355 2360 2365 

Val Arg Asp Ser Thr His Pro Leu Leu His Ala Ala Val Asp Val Pro 

2370 2375 2380 

Gly His Gly Gly Ala Val Phe Thr Gly Arg Leu Ser Pro Asp Glu Gin 
2385 2390 2395 2400 

Gin Trp Leu Thr Gin His Val Val Gly Gly Arg Asn Leu Val Pro Gly 

2405 2410 2415 

Ser Val Leu Val Asp Leu Ala Leu Thr Ala Gly Ala Asp Val Gly Val 

2420 2425 2430 

Pro Val Leu Glu Glu Leu Val Leu Gin Gin Pro Leu Val Leu Thr Ala 

2435 2440 2445 

Ala Gly Ala Leu Leu Arg Leu Ser Val Gly Ala Ala Asp Glu Asp Gly 

2450 2455 2460 

Arg Arg Pro Val Glu lie His Ala Ala Glu Asp Val Ser Asp Pro Ala 
2465 2470 2475 2480 

Glu Ala Arg Trp Ser Ala Tyr Ala Thr Gly Thr Leu Ala Val Gly Val 

2485 2490 2495 

Ala Gly Gly Gly Arg Asp Gly Thr Gin Trp Pro Pro Pro Gly Ala Thr 

2500 2505 2510 

Ala Leu Thr Leu Thr Asp His Tyr Asp Thr Leu Ala Glu Leu Gly Tyr 

2515 2520 2525 

Glu Tyr Gly Pro Ala Phe Gin Ala Leu Arg Ala Ala Trp Gin His Gly 

2530 2535 2540 

Asp Val Val Tyr Ala Glu Val Ser Leu" Asp Ala Val Glu Glu Gly Tyr 
2545 2550 2555 2560 

Ala Phe Asp Pro Val Leu Leu Asp Ala Val Ala Gin Thr Phe Gly Leu 

2565 2570 2575 

Thr Ser Arg Ala Pro Gly Lys Leu Pro Phe Ala Trp Arg Gly Val Thr 

2580 2585 2590 

Leu His Ala Thr Gly Ala Thr Ala Val Arg Val Val Ala Thr Pro Ala 

2595 2600 2605 

Gly Pro Asp* Ala Val Ala Leu Arg Val Thr Asp Pro Thr Gly Gin Leu 

2610 2615 2620 

Val Ala Thr Val Asp Ala Leu Val Val Arg Asp Ala Gly Ala Asp Arg 
2625 2630 2635 2640 

Asp Gin Pro Arg Gly Arg Asp Gly Asp Leu His Arg Leu Glu Trp Val 

2645 2650 2655 

Arg Leu Ala Thr Pro Asp Pro Thr Pro Ala Ala Val Val His Val Ala 

2660 2665 2670 

Ala Asp Gly Leu Asp Asp Leu Leu Arg Ala Gly Gly Pro Ala Pro Gin 

2675 2680 2685 

Ala Val Val Val Arg Tyr Arg Pro Asp Gly Asp Asp Pro Thr Ala Glu 

2690 2695 2700 

Ala Arg His Gly Val Leu Trp Ala Ala Thr Leu Val Arg Arg Trp Leu 
2705 J 2710 " 2715 2720 

Asp Asp Asp Arg Trp Pro Ala Thr Thr Leu Val Val Ala Thr Ser Ala 

2725 2730 2735 

Gly Val Glu Val Ser Pro Gly Asp Asp Val Pro Arg Pro Gly Ala Ala 

2740 2745 2750 

Ala Val Trp Gly Val Leu Arg Cys Ala Gin Ala Glu Ser Pro Asp Arg 

2755 2760 2765 

Phe Val Leu Val Asp Gly Asp Pro Glu Thr Pro Pro Ala Val Pro Asp 

2770 2775 2780 

Asn Pro Gin Leu Ala Val Arg Asp Gly Ala Val Phe Val Pro Arg Leu 
2785 2790 2795 2800 

Thr Pro Leu Ala Gly Pro Val Pro Ala Val Ala Asp Arg Ala Tyr Arg 

2805 2810 2815 

Leu Val Pro Gly Asn Gly Gly Ser He Glu Ala Val Ala Phe Ala Pro 

2820 2825 2830 

Val Pro Asp Ala Asp Arg Pro Leu Ala Pro Glu Glu Val Arg Val Ala 
2835 2840 2845 



39 



WO 01/27284 PCT/US00/27433 

Val Arg Ala Thr Gly Val Asn Phe Arg Asp Val Leu Leu Ala Leu Gly 

2850 ^ 2855 2860 

Met Tyr Pro Glu Pro Ala Glu Met Gly Thr Glu Ala Ser Gly Val Val 
2865 2870 2875 2880 

Thr Glu Val Gly Ser Gly Val Arg Arg Phe Thr Pro Gly Gin Ala Val 

2885 2890 2895 

Thr Gly Leu Phe Gin Gly Ala Phe Gly Pro Val Ala Val Ala Asp His 

2900 2905 2910 

Arg Leu Leu Thr Pro Val Pro Asp Gly Trp Arg Ala Val Asp Ala Ala 

2915 2920 2925 ■ 

Ala Val Pro lie Ala Phe Thr Thr Ala His Tyr Ala Leu His Asp Leu 

2930 2935 2940 

Ala Gly Leu Gin Ala Gly Gin Ser Val Leu Val His Ala Ala Ala Gly 
2945 2950 2955 2960 

Gly Val Gly Met Ala Ala Val Ala Leu Ala Arg Arg Ala Gly Ala Glu 

2965 2970 2975 

Val Phe Ala Thr Ala Ser Pro Ala Lys His Pro Thr Leu Arg Ala Leu 

2980 2985 2990 

Gly Leu Asp Asp Asp His lie Ala Ser Ser Arg Glu Ser Gly Phe Gly 

2995 3000 3005 

Glu Arg Phe Ala Ala Arg Thr Gly Gly Arg Gly Val Asp Val Val Leu 

3010 3015 3020 

Asn Ser Leu Thr Gly Asp Leu Leu Asp Glu Ser Ala Arg Leu Leu Ala 
3025 ■} 3030 3035 3040 

Asp Gly Gly Val Phe Val Glu Met Gly Lys Thr Asp Leu Arg Pro Ala 

3045 3050 3055 

Glu Gin Phe Arg Gly Arg Tyr Val Pro Phe Asp Leu Ala Glu Ala Gly 

3060 3065 3070 

Pro Asp Arg Leu Gly Glu lie Leu Glu Glu Val Val Gly Leu Leu Ala 

3075 3080 3085 

Ala Gly Ala Leu Asp Arg Leu Pro Val Ser Val Trp Glu Leu Ser Ala 

3090 * 3095 3100 

Ala Pro Ala Ala Leu Thr His Met Ser Arg Gly Arg His Val Gly Lys 
3105 3110 3115 3120 

Leu Val Leu Thr Gin Pro Ala Pro Val His Pro Asp Gly Thr Val Leu 

3125 3130 3135 

Val Thr Gly Gly Thr Gly Thr Leu Gly Arg Leu Val Ala Arg His Leu 

3140 -3145 3150 

Val Thr Gly His Gly Val Pro His Leu Leu Val Ala Ser Arg Arg Gly 

3155 3160 3165 

Pro Ala Ala Pro Gly Ala Ala Glu Leu Arg Ala Asp Val Glu Gly Leu 

3170 3175 3180 

Gly Ala Thr lie Glu lie Val Ala Cys Asp Thr Ala Asp Arg Glu Ala 
3185 3190 3195 3200 

Leu Ala Ala Leu Leu Asp Ser lie Pro Ala Asp Arg Pro Leu Thr Gly 

3205 3210 3215 

Val Val His Thr Ala Gly Val Leu Ala Asp Gly Leu Val Thr Ser lie 

3220 3225 3230 

Asp Gly Thr Ala Thr Asp Gin Val Leu Arg Ala Lys Val Asp Ala Ala 

3235 3240 3245 

Trp His Leu His Asp Leu Thr Arg Asp Ala Asp Leu Ser Phe Phe Val 

3250 3255 3260 

Leu Phe Ser Ser Ala Ala Ser Val Leu Ala Gly Pro Gly Gin Gly Val 
3265 3270 3275 3280 

Tyr Ala Ala Ala Asn Gly Val Leu Asn Ala Leu Ala Gly Gin Arg Arg 

3285 3290 3295 

Ala Leu Gly Leu Pro Ala Lys Ala Leu Gly Trp Gly Leu Trp Ala Gin 

3300 3305 3310 

Ala Ser Glu Met Thr Ser Gly Leu Gly Asp Arg He Ala Arg Thr Gly 

3315 3320 3325 

Val Ala Ala Leu Pro Thr Glu Arg Ala Leu Ala Leu Phe Asp Ala Ala 
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3330 3335 3340 

Leu Arg Ser Gly Gly Glu Val Leu Phe Pro Leu Ser Val Asp Arg Ser 
3345 3350 3355 3360 

Ala Leu Arg Arg Ala Glu Tyr Val Pro Glu Val Leu Arg Gly Ala Val 

3365 3370 3375 

Arg Ser Thr Pro Arg Ala Ala Asn Arg Ala Glu Thr Pro Gly Arg Gly 

3380 3385 3390 

Leu Leu Asp Arg Leu Val Gly Ala Pro Glu Thr Asp Gin Val Ala Ala 

3395 3400 3405 

Leu Ala Glu Leu Val Arg Ser His Ala Ala Ala Val Ala Gly Tyr Asp 

3410 3415 3420 

Ser Ala Asp Gin Leu Pro Glu Arg Lys Ala Phe Lys Asp Leu Gly Phe 
3425 3430 3435 3440 

Asp Ser Leu Ala Ala Val Glu Leu Arg Asn Arg Leu Gly Val Thr Thr 

3445 3450 3455 

Gly Val Arg Leu Pro Ser Thr Leu Val Phe Asp His Pro Thr Pro Leu 

3460 3465 3470 

Ala Val Ala Glu His Leu Arg Ser Glu Leu Phe Ala Asp Ser Ala Pro 

3475 3480 3485 

Asp Val Gly Val Gly Ala Arg Leu Asp Asp Leu Glu Arg Ala Leu Asp 

3490 3495 3500 

Ala Leu Pro Asp Ala Gin Gly His Ala Asp Val Gly Ala Arg Leu Glu 
3505 " 3510 3515 3520 

Ala Leu Leu Arg Arg Trp Gin Ser Arg Arg Pro Pro Glu Thr Glu Pro i 

3525 3530 3535 

Val Thr He Ser Asp Asp Ala Ser Asp Asp Glu Leu Phe Ser Met Leu 

3540 3545 3550 

Asp Arg Arg Leu Gly Gly Gly Gly Asp Val 
3555 3560 

<210> 15 
<211> 3201 " 
<212> PRT 

<213> Micromonospora megaloraicea 
<400> 15 

Met Ser Glu Ser Ser Gly Met Thr Glu Asp Arg Leu Arg Arg Tyr Leu 

1 5 10 15 

Lys Arg Thr Val Ala Glu Leu Asp Ser Val Thr Gly Arg Leu Asp Glu 

20 25 30 

Val Glu Tyr Arg Ala Arg Glu Pro He Ala Val Val Gly Met Ala Cys 

35 40 45 

Arg Phe Pro Gly Gly Val Asp Ser Pro Glu Ala Phe Trp Glu Phe He 

50 55 60 

Arg Asp Gly Gly Asp Ala He Ala Glu Ala Pro Thr Asp Arg Gly Trp 
65 70 75 80 

Pro Pro Ala Pro Arg Pro Arg Leu Gly Gly Leu Leu Ala Glu Pro Gly 

85 90 95 

Ala Phe Asp Ala Ala Phe Phe Gly He Ser Pro Arg Glu Ala Leu Ala 

100 105 110 

Thr Asp Pro Gin Gin Arg Leu Met Leu Glu He Ser Trp Glu Ala Leu 

115 120 125 

Glu Arg Ala Gly Phe Asp Pro Ser Ser Leu Arg Gly Ser Ala Gly Gly 

130 135 140 

Val Phe Thr Gly Val Gly Ala Val Asp Tyr Gly Pro Arg Pro Asp Glu 
145 150 155 160 

Ala Pro Glu Glu Val Leu Gly Tyr Val Gly He Gly Thr Ala Ser Ser 

165 170 175 

Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly Pro Ala 

180 185 190 

Val Thr Val Asp Thr Ala Cys Ser Ser Gly Leu Thr Ala Val His Leu 
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biy inr 


Arg 


Leu 






340 


Tyr Gly 


Ala 


Asp 




ODD 




Lys Ser 


Asn 


Tip 

lie 








M n f T , , 0 

weti jjys 


i nr 


Ua 1 
val 


iq«; 

ODD 






nib cTlc 


Den 
nS p 


Gl 11 
Ul u 


Ocl val 


Ua 1 
Val 


Ocl 






420 


Arg Ala 


Gly 


Val 




4 oo 




T 1 a \/a 1 

lie val 


r:i n* 

ulu 


ulu 








Pro Gly 




nla 


4 DO 






Aia bill 


Pro 


PI i/ 


Glu Arg 


Ala 


Leu 






500 


Asp Asp 


Pro 


Ala 




O 1 o 




nig nig 


Al s 


Thr 
i ill 


^ 10 

JJU 






uXU val 


T on 


Al a 

nl a 








(71 w Ala 


Val 


1. 


Phe Pro 


Glv 


Glri 






580 


Arg Gin 


Ser 


Pro 




595 




Ton 2k 1 a 
Jjcu nia 




His 


610 






bin oer 


Leu 


ASp 


625 






Met Val 


Ser 


Leu 


Ala Val 


Val 


Gly 






660 


Gly Ala 


Leu 


Ser 


675 









200 


Leu 


Arg 


Arg Asp 






215 


Met 


Ser 


Ser Pro 




230 




Ala 


Glu 


Asp Gly 


245 






Glv 

* 


Leu 


Ala Glu 


Ala 


Ara 


Ala Glu 






280 


lie 


Asn 


Gin Asp 






295 


Al a 
nlu 


Gl n 


Am Am 




310 




Pro 


Val 
v ax 


Aso Val 


325 






Glv 


Asp 


Pro lie 


Am 
niy 


Glu 


Pro Gly 






360 


Gly 


His 


Thr Gin 






375 


Leu 


Ala 


T.PU Ara 




390 




Pro 


Ser 


Pro His 


405 






Glu 


Thr 


Arg Pro 


Ser 


Ser 


Phe Gly 






440 


Ala 


Pro 


Ser Pro 






455 


Thr 


Gly 


Ala Thr 




470 




Ala 


Glu 


Ala Val 


485 






Arg 


Ala 


Gin Ala 


Pro 


Ser 


Leu Arg 






520 


Trp 


Glu 


His Arg 






535 


Glv 


Leu 


Arg Ala 




550 




GlV 


Ara 


Ala Arg 


565 






Glv 


Ala 


Gin Trp 


Thr 


Phe 


Ala Glu 






600 


Val 


Asp 


Trp Ser 






615 




Val 


Aso Val 




630 




Ala 


Arg 


Leu Trp 


645 






His 


Ser 


Gin Gly 


Leu 


Ala 


Asp Ala 






680 



Glu 


Cys 


Thr Leu 






220 


Glv 


Ala 


Phe Thr 






235 


Ara 


Cvs 


Lys Pro 




250 




Glv 


Ala 


Gly Val 


265 






Glv 


Arg 


Pro Val 


Gly 


Ala 


Ser Asn 






300 


Val 


Tip 
11V 


A rn Gin 
nl^ Ol i 1 






11 5 
0 1 o 


nop 


iyr 


Val Glu 
vai vjiu 




oou 




Glu 


Ala 


His Ala 


O 4 O 






Arg 


Pro 


Leu Trp 


Ala 


Ala 


Ala Gly 






380 


His 


Arg 


Glu lie 






395 


Val 


Asp 


Trp Asp 




410 




Tm 
Ar P 


Prn 


Val Gly 


425 






Hp 




Glv Thr 
ui y i in 


Gin 


Ala 


Ala Asp 






460 


no 


Gl \7 

vjiy 


Thr fien 
1111 nop 








Ala 


Leu 


Val Phe 




4 QO 




Ala 


Arg 


Leu Ala 


505 






A Qn 
nop 


Thr 

1 111 


Ala Php 
nld rue 


Ala 


Val 


Val Val 






540 


Val 


Ala 


Gly Gly 






555 


Ala 


Glv 


Arg Arg 




570 




Gin 


Gly 


Met Ala 


585 






Ser 


lie 


Asp Ala 


Leu 


Arg 


Glu Val 






620 


Val 


Gin 


Pro Val 






635 


Gin 


Ser 


Tyr Gly 




650 




Glu 


He 


Ala Ala 


665 






Ala 


Arg 


Val Val 



42 



205 



Val 


Leu 


Ala 


Glv 

* 


Glu 


Phe 


Arg 


Ser 








240 


Phe 


Ser 


Arg 


Ala 






255 




Leu 


Val 


Leu 


Gin 




270 






Leu 


Ala 


Val 


Leu 


285 








Glv 


Leu 


Thr 


Ala 


Ala 


Leu 


Glu 


Arg 








320 


Ala 


His 


Gly 


Thr 






335 




Leu 


Leu 


Asp 


Thr 




350 






Val 


Glv 


Ser 


Val 


365 








Val 


Ala 


Glv 


Val 


Pro 


Ala 


Thr 


Leu 








400 


Arg 


Glv 


Ala 


Val 






415 




Glu 


Ara 


Pro 


Ara 




430 






Asn 


Ala 


His 


Val 


445 








Leu 


Asp 


Pro 


Thr 


Ala 


Ala 


Pro 


Thr 








480 


Ser 


Ala 


Arg 


Asp 






495 




Asp 


Arg 


Leu 


Thr 




510 






Thr 


Leu 


Val 


Thr 


525 








Gly 


Gly 


Gly 


Glu 


Arg 


Pro 


Val 


Asp 








560 


Val 


Val 


Leu 


Val 
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Arg 


Asp 


Leu 


Leu 
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Cys 


Glu 


Axq 


Ala 


605 








Leu 


Asp 


Gly 


Glu 


Leu 


Phe 


Ala 


Val 
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Val 


Thr 
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Gly 






655 




Ala 


His 


Val 


Ala 




670 






Ala 


Leu 


Arg 


Ser 
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Arg 


Va l 
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Glu 
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955 
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T 7 _ 1 

Val 


Pro 


Pro Ala 


Trp 
965 


Thr 


Asp 


Val 


Val 


ax y 
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Asp 


Gly 


Leu 


Glu 


Gin 
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Arg 


Gly 


Ala 


Thr Val 


Val 
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Cys 


Thr 


Ala 


Gin 


Ser 


Arg 


Ala 


Arg 


He 


Gly 




980 










985 
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Ala 


Ala 


Leu Asp 


Ala 


Val 


Asp 


Gly 


Thr 


Ala 


Leu 


Ser 


Thr 


Val 


Val 


Ser 






995 








1000 








1005 






Leu 


Leu 


Ala Leu 


Ala 


Glu 


Gly 


Gly Ala 


Val 


Asp 


Asp 


Pro 


Ser 


Leu 


Asp 




1010 






1015 








1020 








Thr 


Leu 


Ala Leu 


Val 


Gin 


Ala Leu Gly 


Ala 


Ala 


Gly He Asp 


Val 


Pro 



1025 1030 1035 1040 

Leu Trp Leu Val Thr Arg Asp Ala Ala Ala Val Thr Val Gly Asp Asp 

1045 1050 1055 

Val Asp Pro Ala Gin Ala Met Val Gly Gly Leu Gly Arg Val Val Gly 

1060 1065 1070 

Val Glu Ser Pro Ala Arg Trp Gly Gly Leu Val Asp Leu Arg Glu Ala 

1075 1080 1085 

Asp Ala Asp Ser Ala Arg Ser Leu Ala Ala He Leu Ala Asp Pro Arg 

1090 1095 1100 

Gly Glu Glu Gin Phe Ala He Arg Pro Asp Gly Val Thr Val Ala Arg 
1105 1110 1115 1120 

Leu Val Pro Ala Pro Ala Arg Ala Ala Gly Thr Arg Trp Thr Pro Arg 

1125 1130 1135 

Gly Thr Val Leu Val Thr Gly Gly Thr Gly Gly He Gly Ala His Leu 

1140 1145 1150 

Ala Arg Trp Leu Ala Gly Ala Gly Ala Glu His Leu Val Leu Leu Asn 

1155 1160 1165 

Arg Arg Gly Ala Glu Ala Ala Gly Ala Ala Asp Leu Arg Asp Glu Leu 
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1170 1175 1180 

Val Ala Leu Gly Thr Gly Val Thr lie Thr Ala Cys Asp Val Ala Asp 
1185 1190 1195 1200 

Arg Asp Arg Leu Ala Ala Val Leu Asp Ala Ala Arg Ala Gin Gly Arg 

1205 1210 1215 

Val Val Thr Ala Val Phe His Ala Ala Gly He Ser Arg Ser Thr Ala 

1220 1225 1230 

Val Gin Glu Leu Thr Glu Ser Glu Phe Thr Glu He Thr Asp Ala Lys 

1235 1240 1245 

Val Arg Gly Thr Ala Asn Leu Ala Glu Leu Cys Pro Glu Leu Asp Ala 

1250 1255 1260 

Leu Val Leu Phe Ser Ser Asn Ala Ala Val Trp Gly Ser Pro Gly Leu 
1265 1270 1275 1280 

Ala Ser Tyr Ala Ala Gly Asn Ala Phe Leu Asp Ala Phe Ala Arg Arg 

1285 1290 1295 

Gly Arg Arg Ser Gly Leu Pro Val Thr Ser He Ala Trp Gly Leu Trp 

1300 1305 ' 1310 

Ala Gly Gin Asn Met Ala Gly Thr Glu Gly Gly Asp Tyr Leu Arg Ser 

1315 1320 1325 

Gin Gly Leu Arg Ala Met Asp Pro Gin Arg Ala He Glu Glu Leu Arg 

1330 1335 1340 

Thr Thr Leu Asp Ala Gly Asp Pro Trp Val Ser Val Val Asp Leu Asp 
1345 " 1350 1355 1360 

Arg Glu Arg Phe- Val Glu Leu Phe Thr Ala Ala Arg Arg Arg Pro Leu 

1365 1370 1375 

Phe Asp Glu Leu Gly Gly Val Arg Ala Gly Ala Glu Glu Thr Gly Gin 

1380 1385 1390 

Glu Ser Asp Leu Ala Arg Arg Leu Ala Ser Met Pro Glu Ala Glu Arg 

1395 1400 1405 

His Glu His Val Ala Arg Leu Val Arg Ala Glu Val Ala Ala Val Leu 

1410 1415 1420 

Gly His Gly Thr Pro Thr Val He Glu Arg Asp Val Ala Phe Arg Asp 
1425 1430 1435 1440 

Leu Gly Phe Asp Ser Met Thr Ala Val Asp Leu Arg Asn Arg Leu Ala 

1445 1450 1455 

Ala Val Thr Gly Val Arg Val Ala Thr Thr He Val Phe Asp His Pro 

1460 1465 1470 

Thr Val Asp Arg Leu Thr Ala His Tyr Leu Glu Arg Leu Val Gly Glu 

1475 1480 1485 

Pro Glu Ala Thr Thr Pro Ala Ala Ala Val Val Pro Gin Ala Pro Gly 

1490 1495 1500 

Glu Ala Asp Glu Pro He Ala He Val Gly Met Ala Cys Arg Leu Ala 
1505 1510 1515 1520 

Gly Gly Val Arg Thr Pro Asp Gin Leu Trp Asp Phe He Val Ala Asp 

1525 1530 1535 

Gly Asp Ala Val Thr Glu Met Pro Ser Asp Arg Ser Trp Asp Leu Asp 

1540 1545 1550 

Ala Leu Phe Asp Pro Asp Pro Glu Arg His Gly Thr Ser Tyr Ser Arg 

1555 1560 1565 

His Gly Ala Phe Leu Asp Gly Ala Ala Asp Phe Asp Ala Ala Phe Phe 

1570 1575 1580 

Gly He Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Gin 
1585 1590 1595 1600 

Val Leu Glu Thr Thr Trp Glu Leu Phe Glu Asn Ala Gly He Asp Pro 

1605 1610 1615 

His Ser Leu Arg Gly Thr Asp Thr Gly Val Phe Leu Gly Ala Ala Tyr 

1620 ~ 1625 1630 

Gin Gly Tyr Gly Gin Asn Ala Gin Val Pro Lys Glu Ser Glu Gly Tyr 

1635 1640 1645 

Leu Leu Thr Gly Gly Ser Ser Ala Val Ala Ser Gly Arg He Ala Tyr 
1650 1655 1660 
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Val Leu Gly Leu Glu Gly Pro Ala lie Thr Val Asp Thr Ala Cys Ser 
1665 1670 1675 168( 

Ser Ser Leu Val Ala Leu His Val Ala Ala Gly Ser Leu Arg Ser Gly 

1685 1690 1695 

Asp Cys Gly Leu Ala Val Ala Gly Gly Val Ser Val Met Ala Gly Pro 

1700 1705 1710 

Glu Val Phe Thr Glu Phe Ser Arg Gin Gly Ala Leu Ala Pro Asp Gly 

1715 1720 1725 

Arg Cys Lys Pro Phe Ser Asp Gin Ala Asp Gly Phe Gly Phe Ala Glu 

1730 . 1735 1740 

Gly Val Ala Val Val Leu Leu Gin Arg Leu Ser Val Ala Val Arg Glu 
1745 1750 1755 176( 

Gly Arg Arg Val Leu Gly Val Val Val Gly Ser Ala Val Asn Gin Asp 

1765 1770 1775 

Gly Ala Ser Asn Gly Leu Ala Ala Pro Ser Gly Val Ala Gin Gin Arg 

1780 1785 1790 

Val He Arg Arg Ala Trp Gly Arg Ala Gly Val Ser Gly Gly Asp Val 

1795 1800 1805 

Gly Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro Val 

1810 1815 1820 

Glu Leu Gly Ala Leu Leu Gly Thr Tyr Gly Val Gly Arg Gly Gly Val 
1825 1830 1835 184( 

Gly Pro Val Val Val Gly Ser Val Lys Ala Asn Val Gly His Val Gin 

1845 1850 1855 

Ala Ala Ala Gly Val Val Gly Val He Lys Val Val Leu Gly Leu Gly 

1860 1865 1870 

Arg Gly Leu Val Gly Pro Met Val Cys Arg Gly Gly Leu Ser Gly Leu 

1875 1880 1885 

Val Asp Trp Ser Ser Gly Gly Leu Val Val Ala Asp Gly Val Arg Gly 

1890 1895 1900 

Trp Pro Val Gly Val Asp Gly Val Arg Arg Gly Gly Val Ser Ala Phe 
1905 * 1910 1915 192( 

Gly Val Ser Gly Thr Asn Ala His Val Val Val Ala Glu Ala Pro Gly 

1925 1930 1935 

Ser Val Val Gly Ala Glu Arg Pro Val Glu Gly Ser Ser Arg Gly Leu 

1940 1945 1950 

Val Gly Val Ala Gly Gly Val Val Pro Val Val Leu Ser Ala Lys Thr 

1955 1960 1965 

Glu Thr Ala Leu Thr Glu Leu Ala Arg Arg Leu His Asp Ala Val Asp 

1970 1975 1980 

Asp Thr Val Ala Leu Pro Ala Val Ala Ala Thr Leu Ala Thr Gly Arg 
1985 1990 1995 200( 

Ala His Leu Pro Tyr Arg Ala Ala Leu Leu Ala Arg Asp His Asp Glu 

2005 2010 2015 

Leu Arg Asp Arg Leu Arg Ala Phe Thr Thr Gly Ser Ala Ala Pro Gly 

2020 2025 2030 

Val Val Ser Gly Val Ala Ser Gly Gly Gly Val Val Phe Val Phe Pro 

2035 2040 2045 

Gly Gin Gly Gly Gin Trp Val Gly Met Ala Arg Gly Leu Leu Ser Val 

2050 2055 2060 

Pro Val Phe Val Glu Ser Val Val Glu Cys Asp Ala Val Val Ser Ser 
2065 2070 2075 208( 

Val Val Gly Phe Ser Val Leu Gly Val Leu Glu Gly Arg Ser Gly Ala 

2085 2090 2095 

Pro Ser Leu Asp Arg Val Asp Val Val Gin Pro Val Leu Phe Val Val 

2100 2105 2110 

Met Val Ser Leu Ala Arg Leu Trp Arg Trp Cys Gly Val Val Pro Ala 

2115 2120 2125 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala Val Val Ala 

2130 2135 2140 

Gly Val Leu Ser Val Gly Asp Gly Ala Arg Val Val Ala Leu Arg Ala 
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2145 2150 2155 2160 

Arg Ala Leu Arg Ala Leu Ala Gly His Gly Gly Met Val Ser Leu Ala 

2165 2170 2175 

Val Ser Ala Glu Arg Ala Arg Glu Leu lie Ala Pro Trp Ser Asp Arg 

2180 2185 2190 

lie Ser Val Ala Ala Val Asn Ser Pro Thr Ser Val Val Val Ser Gly 

2195 2200 2205 

Asp Pro Gin Ala Leu Ala Ala Leu Val Ala His Cys Ala Glu Thr Gly 

2210 2215 2220 

Glu Arg Ala Lys Thr Leu Pro Val Asp Tyr Ala Ser His Ser Ala His 
2225 ^ 2230 2235 2240 

Val Glu Gin lie Arg Asp Thr lie Leu Thr Asp Leu Ala Asp Val Thr 

2245 2250 * 2255 

Ala Arg Arg Pro Asp Val Ala Leu Tyr Ser Thr Leu His Gly Ala Arg 

2260 2265 2270 

Gly Ala Gly Thr Asp Met Asp Ala Arg Tyr Trp Tyr Asp Asn Leu Arg 

2275 2280 2285 

Ser Pro Val Arg Phe Asp Glu Ala Val Glu Ala Ala Val Ala Asp Gly 

2290 2295 2300 

Tyr Arg Val Phe Val Glu Met Ser Pro His Pro Val Leu Thr Ala Ala 
2305 2310 2315 2320 

Val Gin Glu lie Asp Asp Glu Thr Val Ala He Gly Ser Leu His Arg 

2325 2330 2335 

Asp Thr Gly Glu Arg His Leu Val Ala Glu Leu Ala Arg Ala His Val 

2340 2345 2350 

His Gly Val Pro Val Asp Trp Arg Ala He Leu Pro Ala Thr His Pro 

2355 2360 2365 

Val Pro Leu Pro Asn Tyr Pro Phe Glu Ala Thr Arg Tyr Trp Leu Ala 

2370 2375 2380 

Pro Thr Ala Ala Asp Gin Val Ala Asp His Arg Tyr Arg Val Asp Trp 
2385 2390 2395 2400 

Arg Pro Leu' Ala Thr Thr Pro Ala Glu Leu Ser Gly Ser Tyr Leu Val 

2405 2410 2415 

Phe Gly Asp Ala Pro Glu Thr Leu Gly His Ser Val Glu Lys Ala Gly 

2420 2425 2430 

Gly Leu Leu Val Pro Val Ala Ala Pro Asp Arg Glu Ser Leu Ala Val 

2435 2440 2445 

Ala Leu Asp Glu Ala Ala Gly Arg Leu Ala Gly Val Leu Ser Phe Ala 

2450 2455 2460 

Ala Asp Thr Ala Thr His Leu Ala Arg His Arg Leu Leu Gly Glu Ala 
2465 2470 2475 2480 

Asp Val Glu Ala Pro Leu Trp Leu Val Thr Ser Gly Gly Val Ala Leu 

2485 2490 2495 

Asp Asp His Asp Pro He Asp Cys Asp Gin Ala Met Val Trp Gly He 

2500 2505 2510 

Gly Arg Val Met Gly Leu Glu Thr Pro His Arg Trp Gly Gly Leu Val 

2515 2520 2525 

Asp Val Thr Val Glu Pro Thr Ala Glu Asp Gly Val Val Phe Ala Ala 

2530 2535 2540 

Leu Leu Ala Ala Asp Asp His Glu Asp Gin Val Ala Leu Arg Asp Gly 
2545 2550 2555 2560 

He Arg His Gly Arg Arg Leu Val Arg Ala Pro Leu Thr Thr Arg Asn 

2565 2570 2575 

Ala Arg Trp Thr Pro Ala Gly Thr Ala Leu Val Thr Gly Gly Thr Gly 

2580 2585 2590 

Ala Leu Gly Gly His Val Ala Arg Tyr Leu Ala Arg Ser Gly Val Thr 

2595 2600 2605 

Asp Leu Val Leu Leu Ser Arg Ser Gly Pro Asp Ala Pro Gly Ala Ala 

2610 2615 2620 

Glu Leu Ala Ala Glu Leu Ala Asp Leu Gly Ala Glu Pro Arg Val Glu 
2625 2630 2635 2640 
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Ala Cys Asp Val Thr Asp Gly Pro Arg Leu Arg Ala Leu Val Gin Glu 

2645 2650 ' 2655 

Leu Arg Glu Gin Asp Arg Pro Val Arg lie Val Val His Thr Ala Gly 

2660 2665 2670 

Val Pro Asp Ser Arg Pro Leu Asp Arg lie Asp Glu Leu Glu Ser Val 

2675 2680 2685 

Ser Ala Ala Lys Val Thr Gly Ala Arg Leu Leu Asp Glu Leu Cys Pro 

2690 * 2695 2700 

Asp Ala Asp Thr Phe Val Leu Phe Ser Ser Gly Ala Gly Val Trp Gly 
2705 2710 2715 2720 

Ser Ala Asn Leu Gly Ala Tyr Ala Ala Ala Asn Ala Tyr Leu Asp Ala 

2725 2730 2735 

Leu Ala His Arg Arg Arg Gin Ala Gly Arg Ala Ala Thr Ser Val Ala 

2740 2745 2750 

Trp Gly Ala Trp Ala Gly Asp Gly Met Ala Thr Gly Asp Leu Asp Gly 

2755 2760 2765 

Leu Thr Arg Arg Gly Leu Arg Ala Met Ala Pro Asp Arg Ala Leu Arg 

2770 2775 2780 

Ala Cys Thr Arg Arg Trp Thr Thr His Asp Thr Cys Val Ser Val Ala 
2785 2790 2795 2800 

Asp Val Asp Trp Asp Arg Phe Ala Val Gly Phe Thr Ala Ala Arg Pro 

2805 2810 2815 

Arg Pro Leu lie Asp Glu Leu Val Thr Ser Ala Pro Val Ala Ala Pro 

2820 2825 2830 

Thr Ala Ala Ala Ala Pro Val Pro Ala Met Thr Ala Asp Gin Leu Leu 

2835 . 2840 2845 

Gin Phe Thr Arg Ser His Val Ala Ala lie Leu Gly His Gin Asp Pro 

2850 ' 2855 2860 

Asp Ala Val Gly Leu Asp Gin Pro Phe Thr Glu Leu Gly Phe Asp Ser 
2865 - " 2870 2875 2880 

Leu Thr Ala Val Gly Leu Arg Asn Gin Leu Gin Gin Ala Thr Gly Arg 

2885 2890 2895 

Thr Leu Pro Ala Ala Leu Val Phe Gin His Pro Thr Val Arg Arg Leu 

2900 2905 2910 

Ala Asp His Leu Ala Gin Gin Leu Asp Val Gly Thr Ala Pro Val Glu 

2915 2920 2925 

Ala Thr Gly Ser Val Leu Arg Asp Gly Tyr Arg Arg Ala Gly Gin Thr 

2930 2935 2940 

Gly Asp Val Arg Ser Tyr Leu Asp Leu Leu Ala Asn Leu Ser Glu Phe 
2945 * 2950 2955 2960 

Arg Glu Arg Phe Thr Asp Ala Ala Ser Leu Gly Gly Gin Leu Glu Leu 

2965 2970 2975 

Val Asp Leu Ala Asp Gly Ser Gly Pro Val Thr Val He Cys Cys Ala 

2980 2985 2990 

Gly Thr Ala Ala Leu Ser Gly Pro His Glu Phe Ala Arg Leu Ala Ser 

2995 3000 3005 

Ala Leu Arg Gly Thr Val Pro Val Arg Ala Leu Ala Gin Pro Gly Tyr 

3010 3015 3020 

Glu Ala Gly Glu Pro Val Pro Ala Ser Met Glu Ala Val Leu Gly Val 
3025 3030 3035 3040 

Gin Ala Asp Ala Val Leu Ala Ala Gin Gly Asp Thr Pro Phe Val Leu 

3045 3050 3055 

Val Gly His Ser Ala Gly Ala Leu Met Ala Tyr Ala Leu Ala Thr Glu 

3060 3065 3070 

Leu Ala Asp Arg Gly His Pro Pro Arg Gly Val Val Leu Leu Asp Val 

3075 3080 3085 

Tyr Pro Pro Gly His Gin Glu Ala Val His Ala Trp Leu Gly Glu Leu 

3090 3095 3100 

Thr Ala Ala Leu Phe Asp His Glu Thr Val Arg Met Asp Asp Thr Arg 
3105 3110 3115 3120 

Leu Thr Ala Leu Gly Ala Tyr Asp Arg Leu Thr Gly Arg Trp Arg Pro 
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3125 3130 3135 

Arg Asp Thr Gly Leu Pro Thr Leu Val Val Ala Ala Ser Glu Pro Met 

3140 3145 3150 

Gly Glu Trp Pro Asp Asp Gly Trp Gin Ser Thr Trp Pro Phe Gly His 

3155 3160 3165 

Asp Arg Val Thr Val Pro Gly Asp His Phe Ser Met Val Gin Glu His 

3170 3175 3180 

Ala Asp Ala lie Ala Arg His lie Asp Ala Trp Leu Ser Gly Glu Arg 
3185 3190 3195 3200 

Ala 



<210> 16 
<211> 358 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 16 



Met 
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Thr 


Thr 
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Gly Arg 
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30 




Leu 


Leu 
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Gly 


His 


Asp 


Asp 


Asp 


Pro 


His Arg 


Trp Tyr 


Arg Gly Leu 






35 










40 






45 






Gly 


Gly 


Ser 


Gly 


Val 


Arg 


Arg 


Ser 


Arg 


Thr Glu 


Thr Trp 


Val 


Val Thr 




50 










55 








60 






Asp 


His 


Ala 


Thr 


Ala 


Val 


Arg 


Val 


Leu 


Asp Asp 


Pro Thr 


Phe 


Thr Arg 


65 










/ u 








"7 R 






80 


Ala 


Thr 


Gly 


Arg 


Thr 


Pro 


blU 


Trp 


Met 


Arg Ala 


Ala Gly 


Ala 


Pro Ala 










q t; 










Qft 






95 


Ser 


Thr 


Trp 


Ala 


o>in 


fro 


rne 


Arg 


Asp 


vai his 


Aia Ala 


Ser 


Trp Asp 








1UU 










i ft £ 

IUj 






110 




Ala 


Glu 


Leu 


pro 


Asp 


Pro 


bin 


blU 


vai 


QslU ASp 


Arg Leu 


Thr 


Gly Leu 






1 1 c 

ilD 










1 












Leu 


Pro 


Ala 


Pro 


Caiy 


i nr 


Arg 


Leu 


Asp 


lieu vai 


Arg Asp 


Leu 


Ala Trp 




130 










135 








140 






Pro 


Met 


Ala 


Ser 


Arg 


Gly 


Val 


Gly 


Ala 


Asp Asp 


Pro Asp 


Val 


Leu Arg 


145 










150 








155 






160 


Ala 


Ala 


Trp 


Asp 


Ala 


Arg 


Val 


Gly 


Leu 


Asp Ala 


Gin Leu 


Thr 


Pro Gin 










165 










170 






175 


Pro 


Leu 


Ala 


Val 


Thr 


Glu 


Ala 


Ala 


He 


Ala Ala 


Val Pro 


Gly Asp Pro 








180 










185 






190 




His 


Arg 


Arg 


Ala 


Leu 


Phe 


Thr 


Ala 


Val 


Glu Met 


Thr Ala 


Thr 


Ala Phe 




195 










200 






205 






Val 


Asp 


Ala 


Val 


Leu 


Ala 


Val 


Thr 


Ala 


Thr Ala 


Gly Ala 


Ala 


Gin Arg 




210 










215 








220 






Leu 


Ala 


Asp 


Asp 


Pro 


Asp 


Val 


Ala 


Ala 


Arg Leu 


Val Ala 


Glu 


Val Leu 


225 










230 








235 






240 


Arg 


Leu 


His 


Pro 


Thr 


Ala 


His 


Leu 


Glu 


Arg Arg 


Thr Ala 


Gly Thr Glu 










245 










250 






255 


Thr 


Val 


Val 


Gly 


Glu 


His 


Thr 


Val 


Ala 


Ala Gly 


Asp Glu 


Val 


Val Val 








260 










2 65 






270 




Val 


Val 


Ala 


Ala 


Ala 


Asn 


Arg 


Asp 


Ala 


Gly Val 


Phe Ala 


Asp 


Pro Asp 






275 










280 






285 






Arg 


Leu 


Asp 


Pro 


Asp 


Arg 


Ala 


Asp 


Ala 


Asp Arg 


Ala Leu 


Ser 


Ala Gin 




290 










295 








300 






Arg 


Gly 


His 


Pro 


Gly 


Arg 


Leu 


Glu 


Glu 


Leu Val 


Val Val 


Leu 


Thr Thr 


305 










310 








315 






320 


Ala 


Ala 


Leu 


Arg 


Ser 


Val 


Ala 


Lys 


Ala 


Leu Pro 


Gly Leu 


Thr 


Ala Gly 










325 










330 






335 


Gly 


Pro 


Val 


Val 


Arg 


Arg 


Arg 


Arg 


Ser 


Pro Val 


Leu Arg 


Ala 


Thr Ala 
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340 

His Cys Pro Val Glu Leu 
355 



345 



350 



<210> 17 
<211> 422 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 17 



Met 


Arg 


Val 


va± 


pne 


Ser 


Ser 


Met 


Aia 


Car 

oer 


Lys 


Car 

oer 


XlX S 


Leu 


rne 


Giy 


1 






c 
0 










1U 










1 D 




Leu 


Val 


Pro 


Leu 


TV 1 -» 

Ala 


Trp 


Ala 


pne 


Arg 


Ala 


Aia 


Gly 


HIS 


GXU 


vai 


Arg 








O A 
ZU 










zo 










JU 






Val 


Val 


Ala 


Ser 


Pro 


Ala 

Ala 


Leu 


rnr 


ASp 


Asp 


T 1 A 

lie 


Tnr 


Aia 


HI -» 

Ala 


Gly 


Leu 






35 










4 n 
9 u 


















Thr 


Ala 


Val 


Pro 


Val 


Gly 


rn V-> v- 

mr 


ASp 


vai 


ASp 


T a • i 

Leu 


vai 


ASp 


pne 


H*a4- 

Mec 


inr 




C ft 

50 










JJ 










OU 










His 


Ala 


Gly 


Tyr 


Asp 


lie 


lie 


Asp 


Tyr 


vai 


Arg 


Ser 


Leu 


Asp 


pne 


oer 


65 










~7 n 










i q 
/ D 










0 U 


Glu 


Arg 


Asp 


Pro 


Ala 


i nr 


Ser 


i nr 


Trp 


Asp 


nls 


Leu 


Leu 


Gxy 


Wet 


Gin 










O D 




















?j 




Thr 


Val 


Leu 


Thr 


pro 


inr 


rne 


Tyr 


Ala 


T All 

Leu 


Met 


oer 


Pro 


Asp 


oer 


Leu 








1UU 










IUj 










1 1 n 

X JLU 






Val 


Glu 


Gly 


Met 


lie 


O A V- 

oer 


pne 


Cys 


Arg 


Ser 


Trp 


Arg 


pro 


Asp 


Trp 


oer 






115 




















1 ZD 








Ser 


Gly 


Pro 


Gin 


Thr 


Fne 


Aia 


Aia 


C A V 

oer 


T 1 A 

lie 


Aia 


Aia 


inr 


vai 


i nr 


Gly 




130 










i jj 




















Val 


Ala 


His 


Ala 


Arg 


Leu 


T All 

Leu 


Trp 


Giy 


Dw A 

fro 


Asp 


T 1 A 

lie 


inr 


vai 


Arg 


Aia 


145 










10U 










1DD 










1 DU 


Arg 


Gin 


Lys 


pne 


T a«-» 

Leu 


Giy 


Leu 


Leu 


Pro 


Giy 


Gin 


rxO 


AX a 


Axa 


nls 


Arg 








ICR 

loo 










i / u 










1 / D 




Glu 


Asp 


Pro 


Leu 


Aia 


G±U 


Trp 


Leu 


i nr 


Trp 


C A V* 

oer 


vax 


GXU 


Arg 


DV) a 

rne 


Gxy 








1 OA 

loU 










10 0 










X 






Gly 


Arg 


val 


pro 


bin 


Asp 


vai 


ulU 




Leu 


vai 


vax 


biy 


oxn 


Trp 
















200 










205 








Tl a 

lie 


TV a*i 

ASp 


fro 


Ala 


Pro 


vai 


vsxy 




Arg 


Leu 


Asp 


Thy 
J. tlx 


Vjxy 


Leu 


Arg 


TK y 




0 1 fl 
clU 










j 










220 










vai 


Giy 


Met 


Arg 


Tyr 


V dl 




lyr 


Aon 


vjxy 


Pro 
rx 


JCl. 


vai 


VGA 


P it* 


Sen 
rvo ^J 


j 










230 










235 










240 


Trp 


T on 

Lieu 


nx b 


7\ en 


Glu 




Thr 




Ara 
Mxg 


Arg 


Val 


Cys 


Leu 


Thr 


Leu 


Glv 








245 










250 










255 




Tip 

lie 




OCX. 


7A T*ri 




Asn 


Ser 


He 


Glv 


Gin 


Val 


Ser 


Val 


Asp 


Asp 


Leu 








260 










265 










270 






Leu 


Gly 


Ala 


Leu 


Gly 


Asp 


Val 


Asp 


Ala 


Glu 


He 


He 


Ala 


Thr 


Val 


Asp 




275 










280 










285 








Glu 


Gin 


Gin 


Leu 


Glu 


Gly 


Val 


Ala 


His 


Val 


Pro 


Ala 


Asn 


He 


Arg 


Thr 




290 










295 










300 










Val 


Gly 


Phe 


Val 


Pro 


Met 


His 


Ala 


Leu 


Leu 


Pro 


Thr 


Cys 


Ala 


Ala 


Thr 


305 








310 










315 










320 


Val 


His 


His 


Gly 


Gly 


Pro 


Gly 


Ser 


Trp 


His 


Thr 


Ala 


Ala 


He 


His 


Gly 










325 










330 










335 




Val 


Pro 


Gin 


Val 


He 


Leu 


Pro 


Asp 


Gly 


Trp 


Asp 


Thr 


Gly 


Val 


Arg 


Ala 








340 










345 










350 






Gin 


Arg 


Thr 


Glu 


Asp 


Gin 


Gly 


Ala 


Gly 


He 


Ala 


Leu 


Pro 


Val 


Pro 


Glu 






355 










360 










365 








Leu 


Thr 


Ser 


Asp 


Gin 


Leu 


Arg 


Glu 


Ala 


Val 


Arg 


Arg 


Val 


Leu 


Asp 


Asp 




370 










375 










380 










Pro 


Ala 


Phe 


Thr 


Ala 


Gly 


Ala 


Ala 


Arg 


Met 


Arg 


Ala 


Asp 


Met 


Leu 


Ala 


385 










390 










395 










400 


Glu 


Pro 


Ser 


Pro 


Ala 


Glu 


Val 


Val 


Asp 


Val 


Cys 


Ala 


Gly 


Leu 


Val 


Gly 
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405 

Glu Arg Thr Ala Val Gly 
420 



410 



415 



<210> 18 
<211> 323 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 18 



Met 


Ser 


Thr 


Asp 


A 1 -% 

Aia 


inr 


HIS 


vai 


Arg 


Leu Gly Arg 


Cys Ala 


T All 

Leu 


Leu 


1 








c 

O 










i n 

JLU 




ID 




Thr 


Ser 


Arg 


Leu 


Trp 


Leu 


Giy 


lnr 


Aia 


Ala Leu Ala 


pi *» n n 
Giy Gin 


TV 0^ 

ASp 


ASp 








20 




















Ala 


Asp 


Ala 


Val 


Arg 


T All 

Leu 


Leu 


Asp 


nlS 


Ala Arg Ser 


Arg Giy 


vai 


A 

Asn 




o c 

JJ 










a n 












Cys 


Leu 


Asp 


Thr 


Ala 


ASp 


ASp 


Asp 


C A V 


Ala ber Tnr 


ber Aia 


Gin 


val 


50 










DO 














Ala 


Glu 


Glu 


Ser 


Val 


Gly 


Arg 


Trp 


Leu 


Ala Gly Asp 


inr Giy 


Arg 


Arg 


65 










*7fj 








/ D 






OU 


Glu 


Glu 


Thr 


val 


Leu 


Ser 


vai 


i nr 


vai 


Giy vai rro 


O v /-\ Pit* 

rro Giy 


Giy 


Gin 










o c 
OO 














Q£ 




Val 


Gly 


Gly 


Gly 


Gly 


T mi 

Leu 


ber 


Aid 


Arg 


n Tin T 1 A 

Gin. lie lie 


Aia ber 


Cys 


pin 

GIU 








1UU 










1UO 




11U 






Gly 


Ser 


Leu 


Arg 


Arg 


Leu 


Gly 


XT- T 

vai 


Asp 


IT J _ U-1 TV _ _ 

His vai Asp 


1 T All 

vai Leu 


HIS 


Leu 




115 
















1^3 






Pro 


Arg 


Val 


Asp 


Arg 


val 


Glu 


Pro 


Trp 


ASp giu vai 


Trp Gin 


Ala 


val 




130 










1 

1JO 






± 1 u 








Asp 


Ala 


Leu 


Val 


Ala 


Ala 


Giy 


Lys 


vai 


Cys Tyr Val 


Gly Ser 


ber 


Gly 


145 










lou 








133 






i cn 

IOU 


Phe 


Pro 


Gly 


Trp 


HIS 


lie 


vai 


Aia 


Ala 


p 1 p 1 ui a 
Gin GIU HIS 


Aia vai 


TV -rt-v 

Arg 


Arg 










1 oD 










1 / u 




J. f 3 




His 


Arg 


Leu 


Giy 


Leu 


vax 


Ser 


nig 


uin 


cys niy iy^ 


TV ts r"\ T 
nSp L6U 


1 nr 


Del 








180 










185 




190 






Arg 


His 


Pro 


Glu 


Leu 


Glu 


Val 


Leu 


Pro 


Ala Ala Gin 


Ala Tyr 


Gly 


Leu 




195 










200 






205 






Gly 


Val 


Phe 


Ala 


Arg 


Pro 


Thr 


Arg 


Leu 


Gly Gly Leu 


Leu Gly 


Gly 


Asp 




210 










215 






220 








Gly 


Pro 


Gly 


Ala 


Ala 


Ala 


Ala 


Arg 


Ala 


Ser Gly Glh 


Pro Thr 


Ala 


Leu 


225 










230 








235 






240 


Arg 


Ser 


Ala 


Val 


Glu 


Ala 


Tyr 


Glu 


Val 


Phe Cys -Arg 


Asp Leu 


Gly 


Glu 








245 










250 




255 




His 


Pro 


Ala 


Glu 


Val 


Ala 


Leu 


Ala 


Trp 


Val Leu Ser 


Arg Pro 


Gly 


Val 








260 










265 




270 






Ala 


Gly 


Ala 


Val 


Val 


Gly 


Ala 


Arg 


Thr 


Pro Gly Arg 


Leu Asp 


Ser 


Ala 






275 










280 






285 






Leu 


Arg 


Ala 


Cys 


Gly 


Val 


Ala 


Leu 


Gly 


Ala Thr Glu 


Leu Thr 


Ala 


Leu 




290 










295 






300 








Asp 


Gly 


He 


Phe 


Pro 


Gly 


Val 


Ala 


Ala 


Ala Gly Ala 


Ala Pro 


Glu 


Ala 


305 










310 








315 






320 


Trp 


Leu 


Arg 























<210> 19 
<211> 247 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 19 

Met Asn Thr Trp Leu Arg Arg Phe Gly Ser Ala Asp Gly His Arg Ala 
1 5 10 15 
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Arg 


Leu 


Tvr 


Cvs 
20 


Phe 


Pro 


His 


Ala 


Gly 
25 


Asp 


Leu 


Ala 
35 


Arg 


Ala 


Leu 


Ala 


Pro 
40 


Glu 


Tvr 


Pro 
50 


Glv 


Ara 


Gin 


Asp 


Arq 
55 


Arg 


Asp 


Glv 


Glu 


He 


Ala 


Asp 


Glu 


Val 


Ala 


Ala 


65 










70 








Glu 


Val 


Pro 


Phe 


Ala 
85 


Leu 


Phe 


Gly 


His 


iyx 


Glu 


Thr 


Ala 


Ara 


Arg 


Leu 


Glu 


Ala 






100 










105 


/\x y 


iiCU 


Phe 


Val 


Ser 


Glv 

oxy 


Gin 


Thr 


Ala 




115 










120 




Thr 

x nr 


Asp 
130 




Pro 

t x w - 


Asp 


Glu 


Asp 
135 


Glv 


Leu 


Gl V 


v a x 


Ser 


Glu 


Ala 


Ala 


Leu 


Ala 


Asp 


145 










150 








iiCU 


t X <J 


vox 


T.on 
Jjcu 


Arg 
165 


Ala 


Asp 


His 


Arg 


Gin 


Ala 


Glv 
oxy 


Pro 
180 


Pro 


Leu 


Arg 


Ala 


Glv 
185 


Thr 


Asp 


Pro 
195 


Leu 


Thr 


Thr 


Val 


Glu 
200 


Asp 


Ser 


Val 
210 


Val 


Pro 


Gly 


Arg 


Thr 
215 


Arg 


Thr 


Leu 


Ala 


Asp 


His 


Val 


Gly 


Glu 


Val 


Ala 


225 










230 








Leu 


Arg 


Leu 


Thr 


Pro 


Thr 


Gly 







245 

<210> 20 
<211> 189 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 20 



He 
1 


Arg 


Val 


Gin 


Asp 
5 


Asp 


Asp 


Ala 


Asp 


Thr 


Ser 


He 


Ala 


Leu 


Val 


Leu 


Leu 


Leu 








20 










25 


Ser 


Leu 


He 


Gly 


He 


Gly 


Thr 


Tyr 


Leu 






35 










40 




Leu 


Ala 


Leu 


Val 


Arg 


Lys 


Asp 


Pro 


Ala 




50 










55 






Glu 


He 


Leu 


Arg 


Tyr 


Gin 


Ala 


Pro 


Pro 


65 










70 








Thr 


Ala 


Glu 


Val 


Glu 


He 


Gly 


Gly 


Val 










85 










Val 


Leu 


He 


Ala 


Asn 


Gly 


Ala 


Ala 


Asn 








100 










105 


Asp 


Pro 


Asp 


Arg 


Phe 


Asp 


Val 


Thr 


Arg 






115 










120 




Phe 


Gly 


His 


Gly 


He 


His 


Tyr 


Cys 


Met 




130 










135 






Glu 


Gly 


Glu 


Val 


Ala 


Leu 


Gly 


Ala 


Leu 


145 










150 








Ser 


Leu 


Gly 


Phe 


Pro 


Ser 


Asp 


Glu 


Val 










165 










Leu 


Arg 


Gly 


He 


Asp 


His 


Leu 


Pro 


Val 
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Ala 


Ala 


Ala 


Asp 


Ser 


Tvr 


Leu 










30 






Val 


Asp 


Val 


Tro 


Ala 


Val 


Gin 








45 








Glu 


Ara 


Ala 


Leu 


Glv 


Thr 


Ala 






60 










Val 


Leu 


Arg 


Asp 


Leu 


Val 


Glv 

oxy 




75 










80 


Ser 


Met 


Glv 


Ala 


Leu 


Val 

▼ ex X 


Ala 


90 










95 




Arg 


Pro 


Glv 

oxy 


Val 


Arg 


Pro 


XJC u 










110 






Pro 


Arg 


v a x 


xix a 


Gl ii 
ox u 


Arrr 


TV 

Arg 








125 








Val 


Gin 


Gl n 

OX 11 


i its \- 


A T*n 
rii y 


nly 


Leu 






140 










gi n 


Gl v 




Leu 


Asp 


wet 


oex 




155 










x ui/ 


Val 




Arg 




xyx: 


Al a 


Trp 


170 










X / J 




He 


Thr 


Thr 


Leu 


Cys 


Gly 


Asp 










190 






Ala 


Gin 


Arg 


Trp 


Leu 


Pro 


Tyr 








205 








Phe 


Pro 


Gly 


Gly 


His 


Phe 


Tyr 






220 










Glu 


Ser 


Val 


Ala 


Pro 


Asp 


Leu 




235 










240 






OCX 


Am 
rix y 




Gl n 

OXU 


T <=m 
Xj6U 


10 

XV 










X ~j 




Ala 


Glv 

oxy 


Phe 


Gl n 
ox u 




OCX 


Val 

VOX 
















Leu 


Leu 


Thr 


His 


Pro 


Asp 


Gl n 

OXIl 








45 








Leu 


Leu 


Pro 


Glv 

oxy 


Ala 


Val 


Glu 






60 










Glu 


Thr 


Thr 


Thr 


Ara 


Phe 


Ala 




75 










80 


Thr 


He 


Pro 


Ala 


Tvr 


Ser 


Thr 


90 










95 




Ara 


Asp 


Pro 


Glv 


Gin 


Phe 


Pro 










110 






Asd 


Ser 


Ara 


Glv 


His 


Leu 


Thr 








125 








Gly 


Arg 


Pro 


Leu 


Ala 


Lys 


Leu 






140 










Phe 


Asp 


Arg 


Phe 


Pro 


Lys 


Leu 




155 










160 


Val 


Trp 


Arg 


Arg 


Ser 


Leu 


Leu 


170 










175 




Arg 


Pro 


Asn 


Gly 
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<210> 21 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic nucleotide DNA duplex 
<400> 21 

taagaattcg gagatctggc ctcagctcta gac 

<210> 22 
<211> 39 



33 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Complementary oligo 
<400> 22 

aattgtctag agctgaggcc agatctccga attcttaat 39 

<210> 23 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 23 

ttgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga ;ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 24 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 24 

ctgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 25 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 
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<221> misc_feature 
<222> (1) . . . (528) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru 101, line 23 

<400> 25 

ctgcagcgcc tctccgtcgc cgtccgcgag ggccgccgag tcctcggcgt cgtcgtcggc 60 

tcggccgtca accaagacgg cgcgtcaaac ggcctcgccg cgccctccgg cgtcgcccag 120 

cagcgcgtca tacgccgcgc gtggggacgc gccggagtat cgggcggcga cgtcggagtc 180 

gtcgaggccc acggcaccgg cacccgcctc ggggatcccg tcgagctggg cgccctcctg 240 

ggcacgtacg gcgtcggccg cggcggcgtc ggcccggtcg tcgtcggcag cgtcaaggcc 300 

aacgtcggcc acgtccaggc cgcggccggc gtcgtcgggg tcatcaaggt cgtcctcggc 360 

ctcggccgcg ggctggtcgg cccgatggtc tgccgcggcg gcctcagcgg cctcgtcgac 420 

tggtcgtccg gcggcctggt cgtcgcggac ggggtccgcg gctggccggt cggcgtcgac 480 

ggcgtccgcc ggggcggcgt ctcggcgttc ggcgtcagcg ggacgaat 528 

<210> 26 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 26 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 
ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 
gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 24 0 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 

<210> 27 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 27 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 
ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 
gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 240 
tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 

<210> 28 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 

<221> miscfeature 
<222> (1) . . . (291) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru page 101, line 23 

<400> 28 

cgtggagtgc gatgcggtcg tgtcgagcgt cgtcggcttc agcgtgctgg gcgtcctgga 60 

gggccgcagc ggcgccccga gcctggaccg cgtcgacgtg gtccagccgg tcctgttcgt 120 

ggtcatggtc agcctggccc gcctgtggcg ctggtgcggc gtggtcccgg ccgccgtggt 180 

cggccacagc cagggcgaga tcgccgccgc ggtcgtggcc ggcgtcctga gcgtcggcga 240 

cggcgcccgc gtcgtggccc tgcgcgcccg cgccctgcgc gccctggccg g 291 

<210> 29 
<211> 24 
<212> DNA 
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<213> Artificial Sequence 



<220> 

<223> PCR primer 



<400> 29 

gaacaactcc tgtctgcggc cgcg 



24 



<210> 30 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 30 

cggaattctc tagagtcacg tctccaaccg cttgtcgagg 40 

<210> 31 
<211> 51 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer- 
<400> 31 

tctagactta attaaggagg acacatatga gcgagagcag cggcatgacc g 51 

<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 32 

aacgcctccc aggagatctc cagca 25 

<210> 33 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 33 

aattcatagc ctaggt 16 

<210> 34 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 34 
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